This repo provides the submission entry for an in-class NLP sentiment analysis competition held at Microsoft AI Singapore group using techniques learned in class to classify text in identifying positive or negative sentiment.
Recommended to install Anaconda, a pre-packaged Python distribution that contains all of the necessary libraries and software for this project. Alternatively, you can make use of Google Colaboratory, which allows you to write and execute Python codes in your browser.
Data
Data for this in-class competition comes from the Sentiment140 dataset where the training and test data consists of randomly sampled 10% and 5% of the dataset.
Open SentimentAnalysis.ipynb
on a jupyter notebook environment, or
- VADER (VALENCE based sentiment analyzer) [67%]
- Naive Bayes
- Linear SVM (Support Vector Machine) [80%]
- Decision Tree
- Random Forest
- Extra Trees
- SVC [80%]
Open SentimentAnalysis_RNN.ipynb
on a jupyter notebook environment, or
The LSTM deep learning method [79%] did not perform better than SVC/SVM method
Open SentimentAnalysis_BERT.ipynb
on a jupyter notebook environment, or
The State-of-the-Art transformer model performs slightly better at [82%] accuracy