/VADEC

Codes and Datasets for our SIGIR 2021 Paper: "Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach"

Primary LanguageJupyter NotebookMIT LicenseMIT

Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach (SIGIR 2021)

Folder "data" :

Contains the dataset we train our model on.

Folder "analysis_data" :

This folder has COVID-19 related tweets from India, that we perform our aspect based analysis on. It has two csv files, that contain predictions of our model along with cleaned tweets

  1. panacea_india_data.csv: containing all tweets from January to July 4th of 2020
  2. panacea_india_data_filt.csv: contains tweets from March 1 of 2020 to July 4th of 2020 (day number:61 to day number:186)

Folder "aspects" :

It has two subfolders:

  1. raw: it has the raw ABAE output: (7 aspects for Annoyed, Optimistic and Surprised, with 100 support words and their scores for each of the aspects)
  2. filtered: it has hand filtered output, where incoherent aspects have been discarded. The remaining aspects have been named, and a few generic, irrelevant support words have been discarded as well. This has been carried out for Annoyed and Optimistic. The final data is saved in json format

word2vec.py

We use this python file to get word2vec models which are required by ABAE to generate the aspects.

normalize_tweets.py :

We use the function normalize tweets, for normalizing the tweets, before using word2vec.py and also to generate the "clean_text" field of panacea_india_data_filt.csv

For scraping/hydrating (scrape.py) :

python scrape.py -s True -q [queries] -l [limit on tweets]  
python scrape.py -H True -f [files containing tweets ids]

Note : The -H stands for hydration, and -s for scraping. Restrictions related to coordinates, time intervals, can be modified directly in the script.

For plotting graphs (plot_graphs.ipynb) :

It's used to plot the counts of aspects (filtered/annoyed.json and filtered/optimistic.json) for tweets read from panacaea_data_india_filt.csv. We count the number of occurences of any of the aspect categores for both emotions in chunks of tweets having 4000 tweets in them, and containing the emotion being considered (e.g. for annoyed, each tweet must have annoyed in its predictions). Ocurrence of any of the support words for an aspect of an emotion, contributes 1 to the total count. Run all the cells of plot_graphs.ipynb to generate the plots.