sentiment analysis on Amazon reviews using deep learning models (simpleRNN and LSTM). Preprocess the dataset, split it for training and validation, and build a vocabulary for word embedding. The project includes a real-time prediction feature for user-inputted reviews and provides a report detailing model summaries and best hyperparameters. Ideal for NLP enthusiasts exploring text classification techniques.
Sentiment analysis is a common task in natural language processing (NLP) that aims to determine the sentiment or opinion expressed in a piece of text. This project leverages deep learning techniques, specifically RNN and LSTM models, to classify the sentiment of Amazon reviews into positive, neutral, or negative categories.
- Data Preprocessing: Includes cleaning and preprocessing text data.
- Tokenization and Padding: Converts text data into a numerical format suitable for model training.
- Model Creation: Defines and initializes RNN and LSTM models.
- Training and Validation: Implements training loops with performance tracking.
- GUI Application: Provides a simple GUI for sentiment prediction using trained models.
- Visualization: Plots training and validation losses.
- Data Preprocessing: Preprocess the dataset to clean and tokenize the text data.
- Model Training: Train the RNN and LSTM models on the preprocessed data.
- Sentiment Prediction: Use the trained models to predict the sentiment of new reviews via a GUI application.
The project uses the Amazon reviews dataset, which contains customer reviews and ratings. The dataset includes the following columns:
sentiments
: The sentiment of the review (positive, neutral, negative).cleaned_review
: The preprocessed review text.cleaned_review_length
: The length of the cleaned review.review_score
: The review score given by the customer.
The project defines and trains two types of models:
- RNN Model: A simple Recurrent Neural Network for sentiment classification.
- LSTM Model: A Long Short-Term Memory network, which is more effective for capturing long-term dependencies in text data.
Both models are trained using the Adam optimizer and cross-entropy loss function.
A simple GUI application is provided for predicting the sentiment of new reviews. The user can input a review, select the model (RNN or LSTM), and get the predicted sentiment.
- Training Loss: Shows a decreasing trend over epochs, indicating good learning progress.
- Validation Loss: Fluctuates but generally decreases, suggesting the model is learning but experiencing some overfitting.
- Validation Accuracy: Stabilizes around 86%, demonstrating reasonable performance for sentiment classification.
RNN Model Performance:
Epoch | Train Loss | Val Loss | Val Accuracy |
---|---|---|---|
1 | 0.6734 | 0.5767 | 77.85% |
2 | 0.5406 | 0.5487 | 79.70% |
3 | 0.4409 | 0.5355 | 80.48% |
... | ... | ... | ... |
20 | 0.0597 | 0.7657 | 86.08% |
- Training Loss: Decreases steadily, indicating effective learning.
- Validation Loss: Generally decreases, showing good generalization with slight overfitting.
- Validation Accuracy: Peaks around 89%, indicating superior performance over the RNN model.
LSTM Model Performance:
Epoch | Train Loss | Val Loss | Val Accuracy |
---|---|---|---|
1 | 0.5736 | 0.4878 | 79.76% |
2 | 0.3643 | 0.3995 | 85.21% |
3 | 0.2299 | 0.4260 | 85.88% |
... | ... | ... | ... |
20 | 0.0117 | 0.7123 | 88.72% |