Sentiment-Analysis

Task : Performing sentiment analysis on movie review Dataset used : IMDB Dataset of 50K Movie Reviews
Source : Kaggle
URL : https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews

Dataset metadata :

It consists of 50,000 rows with equal division of positive and negative reviews. There are two rows, namely "Sentiment" and "Review".

Steps performed

Importing Neccessary libraries
Performing Exploratory Data Analysis -> Showcasing 10 positive and negative sentiments
-> Dropping Duplicate values
-> Checking for NULL values
-> Displaying percentage of positive and negative sentiment
-> Analysing number of words in each category of sentiment
Data Cleaning
-> Decode HTML encoded characters
-> Removing Stop words (only those stopwords which arent negative)
-> Removing URL's
Tokenization
Stemming and Lemming
Displaying Word Cloud
Applying Tf-Idf vectorizer and different models
Applying Tf-Idf with bigrams
Applying Word2Vec as word embedding technique
Result

Libraries

Pandas
Numpy
Sklearn
NLTK
Wordcloud
BeautifulSoup
Matplotlib
Gensim

Models

Decision Tree Classifier
Random Forest Classifier
Logisitic Regression
KNN
Navie Bayes
SVM

Result:

Using Tf-Idf vectorizer for feature extraction, we obtain highest accuracy of 0.87 using SVM model and using word2vec as word embedding technique highest accuracy is using both SVM and Logistic regression which is 0.88

Juhibhojani/Sentiment-Analysis

Sentiment-Analysis

Dataset metadata :

Steps performed

Libraries

Models

Result: