/WWW2021

Official repository to release the code and datasets in the paper "Mining Dual Emotion for Fake News Detection", WWW 2021.

Primary LanguageJupyter Notebook

WWW 2021

This is the official repository of the paper:

Mining Dual Emotion for Fake News Detection. [PDF] [Code] [Slides] [Video] [中文讲解视频]

Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu. Proceedings of 30th The Web Conference (WWW 2021)

An Overall Framework

1

An overall framework of using Dual Emotion Features for fake news detection. Dual Emotion Features consist of three components:

a) Publisher Emotion extracted from the content;

b) Social Emotion extracted from the comments;

c) Emotion Gap representing the similarity and difference between publisher emotion and social emotion.

Dual Emotion Features are concatenated with the features from d) Fake News Detector (here, BiGRU as an example) for the final prediction of veracity.

Datasets

The datasets are available at https://drive.google.com/drive/folders/1pjK0BYiiJt0Ya2nRIrOLCVo-o53sYRBV?usp=sharing. The downloaded datasets (i.e., the dataset folder) need to be moved into the root path of this project.

RumourEval-19

The raw dataset is released by SemEval-2019 Task 7:

Genevieve Gorrell, Ahmet Aker, Kalina Bontcheva, Elena Kochkina, Maria Liakata, Arkaitz Zubiaga, Leon Derczynski (2019). SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours. Proceedings of the 13th International Workshop on Semantic Evaluation, ACL.

Our experimental dataset is in the folder dataset/RumourEval-19, which contains three json files. In every json file,

  • the id identifies the unique id of the post.
  • the label identifies the veracity of the post, whose value ranges in [ fake, real, unverified].
  • the content is the content of the post.
  • the comments are the users' comments list towards the post.
  • the content_emotions_labels and cotent_emotions_probs are the Emotion Category features of the content. And the comments100_emotions_labels_mean_pooling, comments100_emotions_labels_max_pooling, comments100_emotions_probs_mean_pooling, and comments100_emotions_probs_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Weibo-16

The original dataset is firstly proposed in:

Jing Ma, Wei Gao, Prasenjit Mitra, Sejeong Kwon, Bernard J Jansen, Kam-Fai Wong, and Meeyoung Cha. 2016. Detecting rumors from microblogs with recurrent neural networks. In IJCAI 2016. 3818–3824.

In Section 4.1.2 and Appendix A of our paper, we described that there are many fake news duplications in the original dataset. The original version of Weibo-16 is in the folder dataset/Weibo-16-original, and our experimental dataset (a deduplicated version) of Weibo-16 is in the folder dataset/Weibo-16. In every json file in these folders,

  • the label identifies the veracity of the post, whose value ranges in [ fake, real].
  • the content is the content of the post.
  • the comments are the users' comments list towards the post.
  • the content_emotions are the Emotion Category features of the content. And the comments100_emotions_mean_pooling and comments100_emotions_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Weibo-20

Weibo-20 is our newly proposed dataset, and it is in the folder dataset/Weibo-20. Besides, in Section 4.4.3 of the paper, we conducted the experiments under the real-world scenario simulation. This temporal version of Weibo-20 is in the folder dataset/Weibo-20-temporal. In every json file in these folders,

  • the label identifies the veracity of the post, whose value ranges in [ fake, real].
  • the content is the content of the post.
  • the comments are the users' comments list towards the post.
  • the content_emotions are the Emotion Category features of the content. And the comments100_emotions_mean_pooling and comments100_emotions_max_pooling are the Emotion Category features of the earliest 100 comments. The way how to use these features will be described in here.

Emotion Resources

Type Language Resources
Emotion Category English https://github.com/NVIDIA/sentiment-discovery
Chinese https://ai.baidu.com/tech/nlp_apply/emotion_detection
Emotion Lexicon English resources/English/NRC
Chinese /resources/Chinese/大连理工大学情感词汇本体库
Emotional Intensity English resources/English/NRC
Chinese /resources/Chinese/大连理工大学情感词汇本体库
Sentiment Score English nltk.sentiment.vader.SentimentIntensityAnalyzer
Chinese resources/Chinese/BosonNLP
Other Auxilary Features English Wiki: List of emoticons, resources/English/HowNet, resources/English/others
Chinese resources/Chinese/HowNet, resources/English/others

Code

Requirements

Python==3.6.10
Keras==2.1.2
Tensorflow==1.13.1
Tensorflow-GPU==1.14.0

Usage

Step1: Preprocess

Step1.1: Get the labels
cd code/preprocess
python output_of_labels.py
Step1.2: Get the emotion features
cd code/preprocess
python input_of_emotions.py

Note that the Emotion Category features are depended on the external resources (NVIDIA-sentiment-discovery for English, and Baidu AI for Chinese). And they have been saved in the dataset files (e.g.: content_emotions, comments100_emotions_mean_pooling, content_emotions_probs, comments100_emotions_labels_max_pooling, etc.).

If you want to extract emotion features for your custom datasets, you need to access these external resources and prepare Emotion Category features. Of course, you can also leave Emotion Category unused and extract other features by input_of_emotion.py.

Step1.3: Get the semantic features

In this repo, we consider the semantic features as word embeddings. You need to download the preprained word embeddings (see here for more details) before running the following code:

cd code/preprocess
python input_of_semantics.py

Now, the preprocessed data are stored in preprocess/data.

Step 2: Configuration

Config the experimental dataset, the model and other hyperparameters in code/train/config.py.

Step3: Training and Testing

cd code/train
python master.py

Now, the results are stored in train/results.

Citation

@inproceedings{10.1145/3442381.3450004,
    author = {Zhang, Xueyao and Cao, Juan and Li, Xirong and Sheng, Qiang and Zhong, Lei and Shu, Kai},
    title = {Mining Dual Emotion for Fake News Detection},
    year = {2021},
    url = {https://doi.org/10.1145/3442381.3450004},
    doi = {10.1145/3442381.3450004},
    booktitle = {Proceedings of the Web Conference 2021},
    pages = {3465–3476},
    series = {WWW '21}
}