Follow the steps below to run:
- Clone this repository
- Execute
sudo easy_install pip
if you don't already have pip installed. - Execute
pip install virtualenv
to install virtual env - Execute
virtualenv .
- Execute
source bin/activate
to activate the virtual environment - Execute
pip install -r requirements.txt
to install the project modules - Execute
python predict_review_sentiment.py
to start model generation and sentiment prediction cli interface
predict_review_sentiment.py
first splits the original data attached in the challenge according to a ratio. Changing this ratio will overwrite previous files.
It then instantiates a classifier and instantiates ModelGenerator
with the classifier. You can change the classifier used to play aroung with the interface.
After generating the model, saving it, fitting the classifier, and scoring it, the program prompts you for a string input that represents a movie review, which it will then predict the sentiment of.
To turn the string into a vector, SentenceVectorizer
is used, which implements a naive transformation that loses information about ordering and local context.
A constants.py
file was created since I imagined this could evolve as a pipeline producing data that the model is trained on. Live or batched review data could then be fed in for classification. Therefore it's reasonable to assume this can all live on some service that will write intermediate data to a local filesystem.
ModelGenerator.py
works by reading in the data generated by TrainTestDataSplitter.py
and turning them into gensim
's LabeledSentence
class before feeding it into the Doc2Vec
model. We then reference these vectors using previously created tags to assemble our training/testing vectors/labels, so we can fit the input classifier and score it.
- https://www.tensorflow.org/tutorials/word2vec
- https://github.com/linanqiu/word2vec-sentiments
- https://ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html
- https://radimrehurek.com/gensim/index.html
- https://stackoverflow.com/questions/30795944/how-can-a-sentence-or-a-document-be-converted-to-a-vector