Ideally, customers are always satisfied with a company's services, however, this is rarely the case
Twitter is a cornerstrone for communication as many customers share their issues on the platform. For customers, Twitter is accessible and can offer ways for a customer to directly communicate with larger companies such as Google and Apple
Manually screening each tweet can be costly and time consuming
Solution
We wanted to create a Machine Learning algorithm that can correctly detect sentiment within tweets.
By accomplishing this, a company can better screen customer feedback and attend to any issues they may be having
After the first test, we pruned KNN, DecisionTree, and LDA. These models all had a majority of scores lower than 75% and very wide ranges
Test 2
After the second test, we then pruned ADA, Bagging, and Gaussian classifier due to their wide ranges and low accuracies
Test 3
Finally, we pruned only the Bagging Classifier, leaving us with LogisticRegression, MultinomialNB, LinearSVC, SDGSVC, RidgeClassifier, and RandomForestClassifier
Although the stacked classifier did not perform much better than other classifiers alone, the range was tighter and provided more consistent predictions
Tuned Stacked Classifier
For each model mentioned above, we tuned the hyperparameters than repeated the RepeatedStratifiedKFold
Again, we see that the stacked classifier did not perform much better than the other classifiers alone, but there was evidence of a tighter range and thus more consistent predictions
The stacked classifier was chosen as the final model because Stacked Classifiers improve with variation. Although we see in our example that it did not do much better, with the inclusion of more data that is unknown is might perform better than the latter.
Because there are various models used within our stacked classifier. We found the top 50 most important features (words) for each model using SKlearn's Feature_Selection module and counted the frequency of each word.
As seen below, the words "headache", "hate", and "battery" all showed up 6 times. This means that across all of our models within the stacked classifier, these words were among the top 50 most important features 6 times.
2. Local Interpretable Model-Agnostic Explanations (LIME)
For more information regarding how to use LIME please visit their GitHub Repo
Below, you see the feature weights and their corresponding importance for making a prediction. The words seen below make sense considering their connotations within the english language
Limitations & Future Directions
Limitations
Data was collection ~7 years ago. This can be problematic because words and phrases change year to year. So an algorithm such as this one can only be applied to understanding sentiment of Tweets that were made in 2013.
Huge data imbalance. Our strategy to fix this imbalance was to undersample the class with the higher frequency - Positive class. However, in doing this, we limited our data to only 1000 points which could have been a bottleneck within our training
Future Directions
MORE DATA!
Refine the model to include specific slang that is relevant to the current era
analyze specifically what the users are frustrated or happy with (Iphone, Android, etc.)
Create a ternary classification problem that can not only detect positive and negative sentiment, but also neutral sentiment as well