This project performs sentiment analysis on IMDb movie reviews using Python and the Natural Language Toolkit (NLTK) library. The goal is to classify movie reviews as positive or negative based on their text content.
- Dataset: IMDb movie reviews dataset from NLTK.
- Approach: Naive Bayes classifier for sentiment analysis.
- Metrics: Confusion Matrix.
pip install nltk scikit-learn
- Dataset Preparation: IMDb movie reviews dataset is loaded and preprocessed.
- Feature Extraction: Text data is tokenized, stop words are removed, and words are lemmatized.
- Model Training: Naive Bayes classifier is trained on the vectorized text data.
- Model Evaluation: Classifier performance is evaluated using accuracy and confusion matrix.
- Results Visualization: Confusion matrix is plotted to visualize the model's performance.