Sentiment analysis of chat data including text, smileys, emojis and images (gifs) with the included difficuly of sarcasm.
Emoji sentiment using Emoji Sentiment Ranking.
Image sentiment using C3D, a 3D-CNN.
Text sentiment using DeepMoji finetuned on SS-YouTube and SS-Twitter.
-
Download DeepMoji SS-Twitter Keras Model and DeepMoji SS-YouTube Keras Model and place them in
SentimentAnalysis/Text/sentiment/finetuned
. -
Download DeepMoji Weights and place them in
SentimentAnalysis/Text/model
. -
Download C3D Sentiment Model and place in
SentimentAnalysis/Image
.
- See testSentimentAnalysis.py for an example.
- Main implementation of sentiment analysis.
- See testEmojiSentiment.py for an example.
- EmojiSentiment.py: Extract sentiment from emojis.
- config.py: Contains emoji to sentiment mappings.
- build: Build files used to generate emoji sentiment mappings.
- See testImageSentiment.py for an example.
- ImageSentiment.py: Extract sentiment from images(gifs).
- training: Files related to training the classification model.
- See testTextSentiment.py for an example.
- Contains modified version of DeepMoji Python 3 repo.
- sentiment/TextSentiment.py: Extract sentiment from text and smileys.
- sentiment/build: Example of manualy entered training data for finetuning.
- sentiment/finetuned: Finetuned Keras models used for classification.
- Performance of the model was tested on 100 tweets containing emojis from this dataset.
- This paper showed that emoticon blocking (using emoji sentiment as overall sentiment indicator for a sentence) proved to be an effective method of sentiment detection.
- This was tested and the results can be observed below.
- Emoticon blocking appears to perform better on this small dataset which would suggest it would also perform better on a larger dataset.
- Emoticon blocking also handles sarcasm where, for example,
I hate it when you do that 😉
is considered positive overall, where as it would be classified as negative if only the text was considered.
- 5000 images used in training with 2500 of each class.
- Model trained and evaluated on balanced data set using a training/validation split of 70/30.
- Below is an example of the top negative and positive images from the validation data.
-
How is sentiment calculated?
- Text, emojis and images are extracted from a chat sentence like the example given below.
we lost 😒 😅 😛 <img>https://media.giphy.com/media/2rtQMJvhzOnRe/giphy.gif</img>
- Sentiment for each is calculated and a score is returned based on the rules in SentimentAnalysis.calculate_scores().
-
How was the image model trained?
- The C3D Model was finetuned using imges from GIPHY.
- Labelled GIPHY images were obtained from GIFGIF.
- An R script to extract the links from JSON has been included in this project.
- Text files are generated containing links to the images which can be downloaded from terminal using
cat file_name.txt | parallel --gnu "wget {}"
. - Or the exact files used for training and validation can be downloaded here.
-
Why is there no sentiment score for some emojis?
- In my opinion, not all emojis are good indicators of sentiment.
- Only emojis with obvious indicators of sentiment such as facial expressions, popular symbols and hand signs were used.
-
But what about sarcasm detection?
- DeepMoji has learned to understand emotions and sarcasm based on millions of emojis.
- Whether the text contains sarcasm or not is irrelevant, the features extracted using DeepMoji still accurately represent the emotions in the text. These features are used when finetuning new models.
- Using emoticon blocking also helps to calculate the actual sentiment in cases of sarcasm e.g.
I hate it when you do that 😉
is actually positive and contains a positive emoji but negative text. Emoticon blocking considers the emoji as the overall sentiment which would be correct in this case.
-
Why was the model only evaluated on combinations of emojis and text?
- The sentiment of images is only used if there is no sentiment available for emojis or text in a sentence. The accuracy of this is the same as when the image model was evaluated individually.
Email: oisin097@hotmail.com
-
Sentiment analysis from sarcasm detection could be directly tackled using DeepMoji. DeepMoji outlined in their paper to have been finetuned on SCv2-GEN to perform sarcasm detection at 75% accuracy. Text could be first passed through this model to detect sarcasm and then the sarcastic sentences passed to another model finetuned on sentiment labelled sentences containing sarcasm.
-
In my opinion, this approach would not be very accurate as the compounding decrease in classification accuracy (is this sarcastic? -> is this positive/negative?) (75% x ??%) would most likely result in a poor result. There is also an increase in computational overhead which enforces the fact this trade off would most likely not be worthwhile.