This repo is for the Explainable Graph-Based Rumour Detection project. The aim of this project is to document the process and progress of the various stages of the project.
The primary aim of this project is to explain selected state-of-the-art graph-based rumour detection models, in particular, we aim to show how model attribution changes with the restriction of event responses in early rumour detection. To understand how model attribution differs according to architectural differences, we have selected the following models to explain:
- Bi-directional Graph Convolution Network (Bi-GCN)
- Edge-enhanced Bayesian Graph Convolution Network (EBGCN)
- Claim-guided Hierarchical Graph ATtention Network (ClaHi-GAT)
CV explainability techniques can be adapted for the graph component. Images can be intepreted as lattice-shaped graphs with pixels analogous to nodes, therefore graphs can be seen as a generalisation of images. There are 3 very broad categories of explainability techniques in CV which are:
- Gradient
- Relevance
- Local functions
Textual features in current NLP explainability work mainly revolves around counterfactual generation (Word Replacement) as well as a lesser used technique being individual word contribution in accumulated features (Attention).
The proposed work plan with each step and sub-steps are listed below:
- Standardisation of data preprocessing
- Reimplement and retrain the selected models on Twitter 15/16 and PHEME
a. Bi-GCN
b. EBGCN
c. ClaHi-GAT - Adapt explainability techniques to be applied to the models
- Analyse generated explanations for:
a. Full event rumour detection
b. Early rumour detection
i. Time limited (15 mins, 30 mins, 1 hour, etc.)
ii. Response limited (25 responses, 50 responses, 100 responses, etc.)
Dataset | Tweets | Links | Unique Tweet IDs | Source Tweets/Tree | Unique % |
---|---|---|---|---|---|
Twitter 15 | 598,258 | 604,825 | 53,641 | 1,490 | 8.97% |
Twitter 16 | 347,360 | 351,623 | 26,402 | 818 | 7.6% |
Twitter 15 ∪ Twitter 16 | 895,427 | 900,776 | 74,254 | 2,139 | 8.29% |
Twitter 15 ∩ Twitter 16 | 54,454 | 55,672 | 5,789 | 169 | 10.63% |
PHEME-9 | 119419 | - | - | - | - |
Dataset | Tweet IDs | % of RumDect2017 Tweet IDs | Tweet IDs not in RumDect2017 |
---|---|---|---|
RumDect2017 | 53,641 | - | - |
TD File | 1,797 | 3.35% | 508 |
BU File | 3,098 | 5.78% | 957 |
Dataset | Source Tweet IDs | Unseen IDs | % of All Tweet IDs |
---|---|---|---|
RumDect2017 | 2,139 | 100% | - |
Train 15 | nfold | 1490 | |
Test 15 | nfold | 1480 | |
Train 16 | nfold | 818 | |
Test 16 | nfold | 819 |