IMDBSentimentAnalysis using Naive Bayes

Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
- Usage
- Contributing
- License
- Contact

About The Project

This repository contains an implementation of a Naive Bayes classifier for sentiment analysis on the IMDB movie reviews dataset using the Scikit-learn and TensorFlow libraries.

Getting Started

This is an example of how you may give instructions on setting up your project locally. To get a local copy up and running follow these simple example steps.

IMDB Sentiment Analysis using Naive Bayes

This repository contains a TensorFlow-based implementation of a Convolutional Neural Network (CNN) for image classification on the CIFAR-10 dataset.

This repository contains an implementation of a Naive Bayes classifier for sentiment analysis on the IMDB movie reviews dataset using the Scikit-learn and TensorFlow libraries.

Dependencies

To run this code, you need to have the following packages installed:

Numpy
TensorFlow
Scikit-learn
NLTK

You can install these packages using pip:

pip install numpy tensorflow scikit-learn nltk

Dataset

The IMDB movie reviews dataset consists of 50,000 movie reviews, with 25,000 for training and 25,000 for testing. Each review is labeled as either positive (1) or negative (0), indicating the sentiment of the reviewer.

Naive Bayes Classifier

The classifier is implemented using the Multinomial Naive Bayes algorithm from the Scikit-learn library. The text reviews are preprocessed using the CountVectorizer, which vectorizes the reviews and removes common English stopwords using the NLTK library.

Training and Evaluation

The classifier is trained on the training dataset and evaluated on the test dataset. The test accuracy is calculated and printed.

Usage

To train and evaluate the model, simply run the provided code in a Python environment with the required dependencies installed.

python imdb_naive_bayes.py

This will train the model and print the test accuracy upon completion.

Preprocessing Steps

Decode the reviews using the IMDB dataset's word index.
Merge the training and test datasets to create a combined dataset for preprocessing.
Vectorize the reviews using CountVectorizer and remove common English stopwords using the NLTK library.
Split the combined dataset back into training and testing sets.

Code Overview

Import the necessary libraries (NumPy, TensorFlow, Scikit-learn, NLTK).
Load the IMDB dataset and word index.
Define a function to decode a review using the word index.
Prepare the data by decoding the reviews, merging the datasets, and vectorizing the text.
Train a Multinomial Naive Bayes classifier on the training dataset.
Test the model on the test dataset and calculate the test accuracy.
Print the test accuracy.

Dependencies

To run this code, you need to have the following packages installed:

TensorFlow (2.x)
NumPy
Scikit-learn
NLTK

You can install these packages using pip:

pip install tensorflow numpy scikit-learn nltk

Dataset

The dataset used for this project is the IMDB movie review dataset. It contains 50,000 movie reviews split into a training set (25,000 reviews) and a testing set (25,000 reviews). Each review is labeled as positive (1) or negative (0).

Naive Bayes Classifier

The model used for classification is the Multinomial Naive Bayes classifier from the Scikit-learn library. This classifier is suitable for text classification tasks, especially when the dataset is large and sparse.

Usage

To train and evaluate the model, simply run the provided code in a Python environment with the required dependencies installed.

python imdb_naive_bayes.py

Performance The Multinomial Naive Bayes classifier is expected to perform well on the IMDB movie review dataset due to its simplicity and suitability for text classification tasks. The performance may vary depending on the specific dataset used and the preprocessing techniques applied.

Preprocessing

The dataset is preprocessed using the following steps:

Decoding the IMDB reviews from integer sequences back to text.
Combining the training and testing sets to create a single dataset for preprocessing.
Vectorizing the reviews using the CountVectorizer from Scikit-learn library. This converts the text data into a bag-of-words representation.
Removing English stopwords using the Natural Language Toolkit (NLTK).
Splitting the combined dataset back into training and testing sets.

License

This project is licensed under the MIT License. You are free to use, modify, and distribute the code as long as the original copyright and permission notice are included.

Contributing

If you have any suggestions or improvements for this project, feel free to contribute. You can fork the repository, make your changes, and submit a pull request. We appreciate your contributions and will review them as soon as possible.

Steps for contributing:

Fork the repository on GitHub.
Clone your fork of the repository: git clone https://github.com/KeyArgo/IMDBSentimentAnalysis.git
Create a new branch for your changes: git checkout -b my-feature-branch
Make your changes to the code or documentation.
Commit your changes: git commit -am 'Add my new feature'
Push your changes to your fork: git push origin my-feature-branch
Create a new pull request on the original repository.

For any issues or questions, please open an issue on the GitHub repository.

Acknowledgements

The dataset used in this project is provided by the TensorFlow library and is originally from the IMDB Movie Review Dataset. The implementation is based on the Multinomial Naive Bayes classifier from the Scikit-learn library and preprocessing techniques from the NLTK library.

Daniel LaForce

Daniel LaForce - https://github.com/KeyArgo

Please feel free to reach out with any questions, suggestions, or feedback.

python cifar10_cnn.py

This will train the model and print the test accuracy upon completion.

Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Daniel LaForce - danlaforce3@gmail.com

Project Link: https://github.com/KeyArgo/IMDBSentimentAnalysis