
[Natural Language Processing] Using NLTK-3 and Sklearn to train different machine learning classifiers and then using an average system to produce the best optimized sentiment analysis of Twitter feeds.

Primary LanguagePython

Movie Reviews - Sentiment Analysis

Python 3.5 classification of tweets (positive or negative) using NLTK-3 and sklearn.

An analysis of the twitter data set included in the nltk corpus.

What is in this repo

  • An implementation of nltk.NaiveBayesClassifier trained against 1000 tweets. Implemented in Train_Classifiers.py.
  • Naive Bayes:
    • MultinomialNB:
    • BernoulliNB:
  • Linear Model
    • LogisticRegression:
    • SGDClassifier:
  • SVM
    • LinearSVC:

Implemented in Scikit_Learn_Classifiers.py

  • Implemented a voting system to choose the best out of all the learning methods. Implemented in sentiment_mod.py

Accuracy achieved

Classifiers Accuracy achieved
nltk.NaiveBayesClassifier 73.0%
ScikitLearn Implementations
BernoulliNB 72.0%
MultinomialNB 75.0%
LogisticRegression 71.0%
SGDClassifier 69.0%
SVC 48.0%
LinearSVC 74.0%
NuSVC 75.0%


The simplest way(and the suggested way) would be to install the the required packages and the dependencies by using either anaconda or miniconda

After that you can do

$ conda update conda
$ conda install scikit-learn nltk

Downloading the dataset

The dataset used in this package is bundled along with the nltk package.

Run your python interpreter

>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews') 

NOTE: You can check system specific installation instructions from the official nltk website

Check if everything is good till now by running your interpreter again and importing these

>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn

If these imports work for you. Then you are good to go!

Running it

  1. Clone the repo
$ git clone https://github.com/aalind0/Movie_Reviews-Sentiment_Analysis
$ cd Movie_Reviews-Sentiment_Analysis
  1. Order of running

  2. NLTK_Naive_Bayes.py

  3. Scikit_Learn_Classifiers.py

  4. Voting_Algos.py

  5. Hack away!


"So what, Well this is pretty basic!"

Yes, it is but hey we all do start somewhere right?

Coming Up. I am working on a Twitter Sentiment Analysis project which first trains on a given data-set and then takes in the live twitter feeds, analyses them plus plots them for data visualization.

You can follow me on twitter @singh_aalind to keep tabs on it.


Hacked together by Aalind Singh.