/Sentiment-Analysis

Primary LanguageJupyter Notebook

Sentiment-Analysis

Dataset - IMDB Dataset of 50K Movie Reviews (https://www.kaggle.com/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews/notebooks)

Examined a dataset of 50,000 Movie Reviews from IMDB to predict the presence of a correlation between the movie review and the sentiment of a film.

Adopted NLTK (Natural Language ToolKit) to perform data cleaning: BeautifulSoup for removing Html tags, preprocessing, ‘StopWords’, Tokenizer for Vectorization and WordClouds for visualization.

Engineered data for analysis by ‘Feature Engineering’, ‘Bag of words’ representation and ‘TF-IDF Vectorization’.

Performed ‘Sentiment Analysis’ (Supervised Learning) on each representation using various machine learning algorithms: Univariate and Multivariate classification, Random Forest, Logistic Regression, SVM (Support Vector Machine).

Achieved a final accuracy of 90.65% on the TF-IDF Vectorised (70:30 random train-test split) data through Random Forest.