Explainable Graph-Based Rumour Detection

This repo is for the Explainable Graph-Based Rumour Detection project. The aim of this project is to document the process and progress of the various stages of the project.

Objective

The primary aim of this project is to explain selected state-of-the-art graph-based rumour detection models, in particular, we aim to show how model attribution changes with the restriction of event responses in early rumour detection. To understand how model attribution differs according to architectural differences, we have selected the following models to explain:

Bi-directional Graph Convolution Network (Bi-GCN)
Edge-enhanced Bayesian Graph Convolution Network (EBGCN)
Claim-guided Hierarchical Graph ATtention Network (ClaHi-GAT)

Related Work

Graph Structural Features

CV explainability techniques can be adapted for the graph component. Images can be intepreted as lattice-shaped graphs with pixels analogous to nodes, therefore graphs can be seen as a generalisation of images. There are 3 very broad categories of explainability techniques in CV which are:

Gradient
Relevance
Local functions

Textural Features

Textual features in current NLP explainability work mainly revolves around counterfactual generation (Word Replacement) as well as a lesser used technique being individual word contribution in accumulated features (Attention).

Propsed Work Plan

The proposed work plan with each step and sub-steps are listed below:

Standardisation of data preprocessing
Reimplement and retrain the selected models on Twitter 15/16 and PHEME
a. Bi-GCN
b. EBGCN
c. ClaHi-GAT
Adapt explainability techniques to be applied to the models
Analyse generated explanations for:
a. Full event rumour detection
b. Early rumour detection
i. Time limited (15 mins, 30 mins, 1 hour, etc.)
ii. Response limited (25 responses, 50 responses, 100 responses, etc.)

Updates

Dataset and preprocessing

Dataset	Tweets	Links	Unique Tweet IDs	Source Tweets/Tree	Unique %
Twitter 15	598,258	604,825	53,641	1,490	8.97%
Twitter 16	347,360	351,623	26,402	818	7.6%
Twitter 15 ∪ Twitter 16	895,427	900,776	74,254	2,139	8.29%
Twitter 15 ∩ Twitter 16	54,454	55,672	5,789	169	10.63%
PHEME-9	119419	-	-	-	-

Twitter 15 and 16 - RvNN GitHub Repo vs RumDect2017 Dropbox

Comparing top-down and bottom-up files from GitHub against RumDect2017

Dataset	Tweet IDs	% of RumDect2017 Tweet IDs	Tweet IDs not in RumDect2017
RumDect2017	53,641	-	-
TD File	1,797	3.35%	508
BU File	3,098	5.78%	957

Comparing nfold files from GitHub against RumDect2017

Dataset	Source Tweet IDs	Unseen IDs	% of All Tweet IDs
RumDect2017	2,139	100%	-
Train 15	nfold	1490
Test 15	nfold	1480
Train 16	nfold	818
Test 16	nfold	819

Social-AI-Studio/Explain_Rumor