NLP Trek

Analyzing Star Trek Transcripts with Natural Language Processing

Location is denoted by square brackets like [Bridge], [Transporter room], or [Ferengi Science Lab] followed by dialog until the next location denoted.
Actions are denoted by parentheses like (Bok turns up the device) or (Crusher puts two small devices on his forehead, turns the lights out and leaves)
- most actions have a new line character
- some actions are in-line with a character's line
Characters are denoted by all caps followed by a colon like PICARD:, MCCOY:, or Q:
- One or more capital letters ending with a colon
- When a character is in a costume, Q (JUDGE):
- Some names may have numbers
When a character is heard over communications, PICARD [OC]:
Logs
- Captain's logs look like Captain's log, stardate 41153.7. Preparing to detach...
- Crew logs look like Personal log, Commander William Riker. Stardate 41153.7. on their own line.
Stardate is the word stardate followed by a number only one decimal 41153.7

Who speaks the most?
Sentiment by character?
- Does the sentiment of a character change over time?
- Does the sentiment of episode get more positive/negative over time?
Topics by speaker? Who is talking about what?
Topics by background (Federation vs. other groups)
Who takes the most actions? What kind of actions
Character prediction classifier
- Make a model that takes in text and returns a prediction of which character
- predict_character("Honor above all") => "Worf"

Analytics and Exploration
- about who talks the most, least, etc...
- Analytics about Locations
- What locations show up most frequently in episodes
- Analytics about only the actions
- n-grams
  - for exploration
  - concat with "-" character to add n-grams to the tf-idf vectorization process to include in modeling
- skip grams
- Word clouds
  - by episode
  - by season
  - by character
  - by character in 1st season vs. last season
Topic modeling
- compare and contrast
- per series type (TOS, TNG, DS9, Voyager, etc...)
- per season
- per episode
- by character
Sentiment analysis
- compare and contrast sentiment
- by episode
- by character
- by season
Dialog Generator
- Episode generator
- Character based chatbot like a Picard chatbot