Python 3.5
classification of tweets (positive or negative) using NLTK-3
and sklearn
.
An analysis of the twitter
data set included in the nltk
corpus.
- An implementation of
nltk.NaiveBayesClassifier
trained against 1000 tweets. Implemented inTrain_Classifiers.py
. - Naive Bayes:
-
MultinomialNB
: -
BernoulliNB
:
-
- Linear Model
-
LogisticRegression
: -
SGDClassifier
:
-
- SVM
-
LinearSVC
:
-
Implemented in Scikit_Learn_Classifiers.py
- Implemented a voting system to choose the best out of all the learning methods. Implemented in
sentiment_mod.py
Classifiers | Accuracy achieved |
---|---|
nltk.NaiveBayesClassifier |
73.0% |
ScikitLearn Implementations | |
BernoulliNB |
72.0% |
MultinomialNB |
75.0% |
LogisticRegression |
71.0% |
SGDClassifier |
69.0% |
SVC |
48.0% |
LinearSVC |
74.0% |
NuSVC |
75.0% |
The simplest way(and the suggested way) would be to install the the required packages and the dependencies by using either anaconda or miniconda
After that you can do
$ conda update conda
$ conda install scikit-learn nltk
The dataset used in this package is bundled along with the nltk
package.
Run your python interpreter
>>> import nltk
>>> nltk.download('stopwords')
>>> nltk.download('movie_reviews')
NOTE: You can check system specific installation instructions from the official nltk
website
Check if everything is good till now by running your interpreter again and importing these
>>> import nltk
>>> from nltk.corpus import stopwords, movie_reviews
>>> import sklearn
>>>
If these imports work for you. Then you are good to go!
- Clone the repo
$ git clone https://github.com/aalind0/Movie_Reviews-Sentiment_Analysis
$ cd Movie_Reviews-Sentiment_Analysis
-
Order of running
-
NLTK_Naive_Bayes.py
-
Scikit_Learn_Classifiers.py
-
Voting_Algos.py
-
Hack away!
"So what, Well this is pretty basic!"
Yes, it is but hey we all do start somewhere right?
Coming Up. I am working on a Twitter Sentiment Analysis project which first trains on a given data-set and then takes in the live twitter feeds, analyses them plus plots them for data visualization.
You can follow me on twitter @singh_aalind to keep tabs on it.
Hacked together by Aalind Singh.