🎬 Sentiment Analysis with IMDB Movie Reviews 🎥

Project Overview 📝

Dive into the world of sentiment analysis with this exciting project! We analyze IMDB movie reviews to determine the sentiment behind them using cutting-edge machine learning techniques. From data preprocessing and text cleaning to feature extraction and model training, we explore it all with Naive Bayes and Support Vector Machine (SVM) classifiers.

Type: Natural Language Processing (NLP)
Language: Python

🛠️ Libraries Used

Explore the powerful libraries that drive this project:

Pandas: For seamless data manipulation and analysis
NumPy: For efficient numerical operations
Matplotlib: To visualize data in style
Scikit-Learn: To implement and evaluate machine learning models
NLTK: For mastering natural language processing
Regular Expressions (re): To clean and refine text data

📊 Dataset

We’re working with the IMDB Movie Reviews Dataset – a treasure trove of movie reviews! The dataset file, IMDB Dataset.csv, includes:

review: The actual movie review text
sentiment: The sentiment label (positive or negative)

📝 Steps

Here’s how we bring this project to life:

Import Libraries: Get the essential tools ready for data processing, visualization, and machine learning.
Load and Inspect Data: Peek into the dataset, check for any missing values, and understand the data distribution.
Data Preprocessing: Transform text to lowercase, clean out HTML tags, tokenize reviews, and perform lemmatization.
Data Preparation: Split the data into training and testing sets, encode labels, and convert text into TF-IDF features.
Model Training and Evaluation: Train and test Naive Bayes and Support Vector Machine models, then evaluate their performance with accuracy scores, confusion matrices, and classification reports.

✨ Features

Our project shines with the following features:

Data Preprocessing: Clean and tokenize text, strip HTML tags, and normalize text.
Feature Extraction: Convert text into numerical features using TF-IDF vectorization.
Model Training: Build and train Naive Bayes and SVM classifiers.
Evaluation: Assess model performance with accuracy scores, confusion matrices, and detailed classification reports.

Usage 🚀

Preprocess the data: Clean and tokenize the text data.
Train the model: Fit a machine learning model on the training data.
Evaluate the model: Test the model on the test data and calculate metrics like accuracy, precision, recall, etc.
Predict sentiment: Use the trained model to predict the sentiment of new reviews.

Modeling 🧠

The project explores several machine learning models, including:

Logistic Regression
Support Vector Machines (SVM)
Naive Bayes
Random Forest

We also experimented with hyperparameter tuning to improve model performance.

Evaluation 📈

The performance of each model is evaluated using metrics such as:

Accuracy
Precision
Recall
F1 Score

The confusion matrix is also used to visualize the performance of the models.

📈 Results

See how well our models perform! We evaluate them based on accuracy, confusion matrices, and classification reports to gauge their sentiment classification prowess.

Contributing 🤝

Contributions are welcome! If you have suggestions for improvements, feel free to fork the repository and create a pull request.

🙏 Acknowledgements

A big shoutout to:

Dataset: The amazing IMDB movie reviews dataset, courtesy of Kaggle.
Libraries: Our project’s backbone includes pandas, numpy, matplotlib, scikit-learn, and nltk.
Inspiration: Inspired by fantastic sentiment analysis tutorials and groundbreaking NLP research.

👨‍💻 Author

Santhosh VS - Connect with me on LinkedIn

📧 Contact

Got questions or feedback? Drop me a line at santhosh02vs@gmail.com. I’d love to hear from you!

Itssanthoshhere/Sentiment-Analysis-with-IMDB-Movie-Reviews