A Movie review system based on the comments of the users and not the rating. Made using NLP and Semtiment Analysis
Clone the folder and type npm install
to install all modules
The machine learning and prediction is done by python using the NLTK library for natural language processing. There are more than one classifiers at play. The classifiers are:
- Multinomial Naive Bayes
- Bernoulli Naive Bayes
- Logistic Regression
- SGD Classifier
- SVC
- Linear Support Vector Classifier
- NuSVC
All of these classifier are combined under a voted classifier class:
class VoteClassifier(ClassifierI):
def __init__(self, classifiers):
self.classifiers = classifiers
def classify(self, featureset):
votes = []
for classifier in self.classifiers:
v = classifier.classify(featureset)
votes.append(v)
return mode(votes)
def confidence(self, featureset):
votes = []
for classifier in self.classifiers:
v = classifier.classify(featureset)
votes.append(v)
choiceVotes = votes.count(mode(votes))
conf = choiceVotes/len(votes)
return conf
Here:
- classify(self, featureset) is responsible to classify the given input. It does this by taking the mode of the most predicted value.
- confidence(self, featureset) gives us the confidence or reliability of our predicted data. I.e it tells us how many of our classifiers predicted the same result.
The classifiers are saved using pickle in ./classifiers to avoid training them again and again. Also the voted classifier is saved along with them.
Node simple calls the python script and gives the paragraph as an argument. The script returns the prediction to stdout.
exec('python "./externalScripts/predict.py" "'+review+'"', callback(){})
The average time in predictions of 25 reviews consisting of multiple paragraphs is 15 Seconds.
Highest: 97% Lowest: 83% Average: 90.06% Mode: 93%