This python project is used to tag the script lines with their sentiment values using different classifiers For further processing of this data see the enclosed Java project.
Sentiment model based on GOP debate tweets. NLTK naive bayes classifier is used to classify new text. Performance is not very good, script lines are mostly tagged as negative (~90%)
vaderSentiment.py uses the nltk vader package to do sentiment analysis. The original data is tagged with these sentiment values.
- Python 3.x
- nltk (download models: en, vader)
Analysis of character vocabulary using TF-IDF and Word2Vec.
Source: https://kaggle.com/wcukierski/the-simpsons-by-the-data (Author: Todd Schneider)