/Sentiments_Analysis_IMDB_Movie_Reviews

Sentimental analysis on IMDB Movies reviews using Machine Learning classifier (Logistic Regression, Naive Bayes, SVM) and Deep Learning Models (Deep Neural Network, RNN, CNN)

Primary LanguageJupyter NotebookMIT LicenseMIT

Sentiments_Analysis_IMDB_Movie_Reviews

  • Sentimental analysis on IMDB reviews using Machine Learning classifier(Logistic Regression, Naive Bayes, SVM) and Deep Learning Models(Deep Neural Network, RNN, CNN)
  • Sentiment analysis is very useful in social media monitoring. It helps companies extract insights from social media data. For example, a company can gain an overview of wider public opinion about a particular topic.
  • In this project, we will work on sentiment analysis of IMDB Movie Reviews.

Description

  • The main aim of this project is evaluate some ML classifiers in sentiment analysis task for PREDICTION of sentiment from dataset. To tackle this problem has, made Bag of Words model and TFIDF model for machine learning classifier and three ML classifiers ( linear SVM, Naive Bayes, Logistic Regression) as well as perform some text pre-processing techniques such as data cleaning,text normalization,steamming,stopwords removal on the IMDB movie reviews dataset.
  • At the end, it is attempted to solve this problem by simple deep neural network architectures,(Recurrent neural network(LSTM)+dropout) architecture, 2-layer RNN(LSTM) architecture and CNN.
  • It uses a technique called LSTM with a fully connected layer and softmax function as activation function.
  • IMDB(Internet Movie Database) contain 50k reviews,involve both negative and positive reviews. It is the world's most popular source of movie,TV and celebrity content. We can also find rating and reviews for the latest movies and TV shows. Dataset taken from Kaggle site.Link to the dataset https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Creating a Deep Learning Neural Network and ML classifiers to Classify/Predict whether a textual movie review is positive or negative.

Conclusion:

  • CNN model gives best accuracy among all with 83.21% on training and testing dataset.