This project aims to analyze and classify user reviews from Novo Trustpilot ratings. The primary objective is to identify the intent behind each review, such as 'Lost Password', 'Account Setup', 'Positive Feedback', etc., using natural language processing (NLP) techniques and machine learning models.
- Data Preprocessing: Cleaning the text data by removing non-alphabetical characters and converting text to lowercase.
- Tokenization and Lemmatization: Breaking down the text into tokens and normalizing words to their base form.
- Word Embeddings: Creating word embeddings using Word2Vec to convert text into numerical vectors.
- Intent Classification: Using LSTM (Long Short-Term Memory) neural networks to classify the intent of each review.
- Performance Metrics: Evaluating the model using accuracy and loss metrics.
- Validation: Splitting the dataset into training and testing sets to validate the model's performance.
- Data Imbalance: Some intent categories had significantly more examples than others. Addressing this required careful data sampling and weighting during model training.
- Text Ambiguity: User reviews often contain ambiguous language, making intent classification challenging. The use of advanced NLP techniques and fine-tuning of models helped mitigate this issue.
- Accuracy: The final model achieved an accuracy of approximately 65% on the test dataset.
- Loss: The model's loss on the test dataset was approximately 1.29.
- Positive Feedback: "Great service, highly recommend!"
- Account Setup: "It was easy to set up a new account."
- Service Issues: "I had problems with the customer service."
- Continuous Improvement: Regularly updating the model with new reviews and intents.
- Enhanced Features: Incorporating additional features such as review metadata and user profiles to improve intent classification.