- Text from http://www.chakoteya.net/StarTrek/
- One line or action is the observation
- Each row will have the following features:
- episode name
- season
- location
- character
- character's line
- some lines are captain's logs or actions
- Location is denoted by square brackets like
[Bridge]
,[Transporter room]
, or[Ferengi Science Lab]
followed by dialog until the next location denoted. - Actions are denoted by parentheses like
(Bok turns up the device)
or(Crusher puts two small devices on his forehead, turns the lights out and leaves)
- most actions have a new line character
- some actions are in-line with a character's line
- Characters are denoted by all caps followed by a colon like
PICARD:
,MCCOY:
, orQ:
- One or more capital letters ending with a colon
- When a character is in a costume,
Q (JUDGE):
- Some names may have numbers
- When a character is heard over communications,
PICARD [OC]:
- Logs
- Captain's logs look like
Captain's log, stardate 41153.7. Preparing to detach...
- Crew logs look like
Personal log, Commander William Riker. Stardate 41153.7.
on their own line.
- Captain's logs look like
- Stardate is the word stardate followed by a number only one decimal
41153.7
- Who speaks the most?
- Sentiment by character?
- Does the sentiment of a character change over time?
- Does the sentiment of episode get more positive/negative over time?
- Topics by speaker? Who is talking about what?
- Topics by background (Federation vs. other groups)
- Who takes the most actions? What kind of actions
- Character prediction classifier
- Make a model that takes in text and returns a prediction of which character
- predict_character("Honor above all") => "Worf"
-
Analytics and Exploration
- about who talks the most, least, etc...
- Analytics about Locations
- What locations show up most frequently in episodes
- Analytics about only the actions
- n-grams
- for exploration
- concat with "-" character to add n-grams to the tf-idf vectorization process to include in modeling
- skip grams
- Word clouds
- by episode
- by season
- by character
- by character in 1st season vs. last season
-
Topic modeling
- compare and contrast
- per series type (TOS, TNG, DS9, Voyager, etc...)
- per season
- per episode
- by character
-
Sentiment analysis
- compare and contrast sentiment
- by episode
- by character
- by season
-
Dialog Generator
- Episode generator
- Character based chatbot like a Picard chatbot
- Acquire all episodes from a specific series
- MVP Prepare
- get Captain's logs
- get all character lines
- Explore
- Modeling:
- train a character prediction classifier