/ChatSentimentAnalysis

Sentiment analysis of chat data including text, smileys, emojis and images (gifs) with the included difficuly of sarcasm.

Primary LanguagePython

ChatSentimentAnalysis

main

License Python

Sentiment analysis of chat data including text, smileys, emojis and images (gifs) with the included difficuly of sarcasm.

Emoji sentiment using Emoji Sentiment Ranking.

Image sentiment using C3D, a 3D-CNN.

Text sentiment using DeepMoji finetuned on SS-YouTube and SS-Twitter.


Table of Contents


Installation

SentimentAnalysis.py

  • See testSentimentAnalysis.py for an example.
  • Main implementation of sentiment analysis.

Emoji

emoji

  • See testEmojiSentiment.py for an example.
  • EmojiSentiment.py: Extract sentiment from emojis.
  • config.py: Contains emoji to sentiment mappings.
  • build: Build files used to generate emoji sentiment mappings.

Image

  • See testImageSentiment.py for an example.
  • ImageSentiment.py: Extract sentiment from images(gifs).
  • training: Files related to training the classification model.

Text

text

  • See testTextSentiment.py for an example.
  • Contains modified version of DeepMoji Python 3 repo.
  • sentiment/TextSentiment.py: Extract sentiment from text and smileys.
  • sentiment/build: Example of manualy entered training data for finetuning.
  • sentiment/finetuned: Finetuned Keras models used for classification.

Performance

  • Performance of the model was tested on 100 tweets containing emojis from this dataset.
  • This paper showed that emoticon blocking (using emoji sentiment as overall sentiment indicator for a sentence) proved to be an effective method of sentiment detection.
  • This was tested and the results can be observed below.

Emoticon Blocking

emoticon-blocking

Text Only

text-first

  • Emoticon blocking appears to perform better on this small dataset which would suggest it would also perform better on a larger dataset.
  • Emoticon blocking also handles sarcasm where, for example, I hate it when you do that 😉 is considered positive overall, where as it would be classified as negative if only the text was considered.

C3D Image Sentiment Model

accuracy

  • 5000 images used in training with 2500 of each class.
  • Model trained and evaluated on balanced data set using a training/validation split of 70/30.
  • Below is an example of the top negative and positive images from the validation data.

Top Negative

neg1 neg2 neg3 neg4 neg5 neg6 neg7 neg8

Top Positive

1pos pos2 pos3 pos4 pos5 pos6 pos7 pos8

FAQ

  • How is sentiment calculated?

    • Text, emojis and images are extracted from a chat sentence like the example given below.
    • we lost 😒 😅 😛 <img>https://media.giphy.com/media/2rtQMJvhzOnRe/giphy.gif</img>
    • Sentiment for each is calculated and a score is returned based on the rules in SentimentAnalysis.calculate_scores().
  • How was the image model trained?

    • The C3D Model was finetuned using imges from GIPHY.
    • Labelled GIPHY images were obtained from GIFGIF.
    • An R script to extract the links from JSON has been included in this project.
    • Text files are generated containing links to the images which can be downloaded from terminal using cat file_name.txt | parallel --gnu "wget {}".
    • Or the exact files used for training and validation can be downloaded here.
  • Why is there no sentiment score for some emojis?

    • In my opinion, not all emojis are good indicators of sentiment.
    • Only emojis with obvious indicators of sentiment such as facial expressions, popular symbols and hand signs were used.
  • But what about sarcasm detection?

    • DeepMoji has learned to understand emotions and sarcasm based on millions of emojis.
    • Whether the text contains sarcasm or not is irrelevant, the features extracted using DeepMoji still accurately represent the emotions in the text. These features are used when finetuning new models.
    • Using emoticon blocking also helps to calculate the actual sentiment in cases of sarcasm e.g. I hate it when you do that 😉 is actually positive and contains a positive emoji but negative text. Emoticon blocking considers the emoji as the overall sentiment which would be correct in this case.
  • Why was the model only evaluated on combinations of emojis and text?

    • The sentiment of images is only used if there is no sentiment available for emojis or text in a sentence. The accuracy of this is the same as when the image model was evaluated individually.

Support

Email: oisin097@hotmail.com


Additional Notes

  • Sentiment analysis from sarcasm detection could be directly tackled using DeepMoji. DeepMoji outlined in their paper to have been finetuned on SCv2-GEN to perform sarcasm detection at 75% accuracy. Text could be first passed through this model to detect sarcasm and then the sarcastic sentences passed to another model finetuned on sentiment labelled sentences containing sarcasm.

  • In my opinion, this approach would not be very accurate as the compounding decrease in classification accuracy (is this sarcastic? -> is this positive/negative?) (75% x ??%) would most likely result in a poor result. There is also an increase in computational overhead which enforces the fact this trade off would most likely not be worthwhile.