# | Feature | Description |
---|---|---|
0 | utterance length | number of words in the line |
1 | average word length | average length of words in the line |
2 | word diversity | type-token ratio for this line |
3 | stop words ratio | percentage of words in this line that are stop words |
4 | neologisms ratio | percentage of words in this line that are not in our vocabulary |
5 | number of numbers | how many numbers this line contains |
6 | number of profanity words | how many profanity words this line contains |
7 | subjectivity | subjectivity score form textblob |
8 | polarity | polarity score form textblob |
9 | question count | number of sentences in this line that are questions |
10 | exclamation count | number of sentences in this line that end in exclamation marks |
11 | ellipses count | number of ellipses this line contains |
12 to 12+N-1 | top words | number of words in this line that are also in each character's top 20 most frequent words, for the N main characters of the show |
Amy (3,473), Bernadette (2,687), Howard (5,858), Leonard (9,765), Penny (7,659), Raj (4,680), Sheldon (11,703),
Bart (13,139), Homer (28,447), Lisa (10,945), Marge (13,367), Ned Flanders (2,057)
Bree (4,130), Gabrielle (4,564), Lynette (4,618), Susan (5,125)