This project focuses on performing sentiment analysis on Stanford's movie review dataset. Various machine learning models, including Linear SVM, Naive Bayes, Logistic Regression, Random Forest, and Gradient Boosting, are trained and evaluated to classify movie reviews as positive or negative.
The dataset can be downloaded from Stanford's website at the following link: Stanford Movie Review Dataset
After downloading, unzip the files and place them in a directory named data
within the project's root directory.
To set up the required environment:
pip install -r requirements.txt
Based on the conducted experiments, the following accuracies were observed on the test set:
Model | Test Accuracy (%) |
---|---|
Linear SVM | 88.82 |
Logistic Regression | 87.37 |
Naive Bayes | 85.66 |
Random Forest | 84.49 |
Gradient Boosting | 80.75 |
From the results, the Linear SVM and Logistic Regression models stood out with their high accuracies, showcasing their effectiveness for this sentiment analysis task.
For a more detailed comparison and analysis, refer to the output plots generated by the script or the provided report.
Parsa Mazaheri, October 2023