In this project IMDB user reviews are classified into three classes: negative, neutral, positive. We apply various methods and compare the results. Combining Tf-idf with SVM results in the best score.
We use reviews from IMDB website. Training data set consists of 3000 samples whereas 750 samples are used for validation and testing.
-
Install necessary packages.
pip install -r requirements
-
Train (There should be a training set in the directory "TRAIN" for this to work).
python3 train.py
-
Test.
python3 462project_step2_Solis.py step2_model_Solis.pkl TEST
Outputs a string consisting of P,N,Z characters representing predicted class of each document.