Rumour Detection via Zero-shot Cross-lingual Transfer Learning

Paper

L. Tian, X. Zhang, and J.H. Lau (2021). Rumour Detection via Zero-shot Cross-lingual Transfer Learning. In Proceedings of The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021), pages 603—618.

Data and Resource

We use Twitter15/Twitter16, PHEME and WEIBO datasets for rumour detection. The five-fold split is from here

In this repository, we do not provide you with the raw input data. Please download the datasets from the following links.

Dataset	Link
T15/T16	https://www.dropbox.com/s/7ewzdrbelpmrnxu/rumdetect2017.zip?dl=0 )
PHEME	https://figshare.com/articles/dataset/PHEME_dataset_of_rumours_and_non-rumours/4010619
WEIBO	https://alt.qcri.org/~wgao/data/rumdect.zip
Extra Chinese Pretraining Data	https://archive.ics.uci.edu/ml/datasets/microblogPCU
Extra English Pretraining Data	https://github.com/KaiDMML/FakeNewsNet

Dependencies

Python 3.6
Run pip install -r requirements.txt

Adaptive Pretraining script:

Clone the transformer to local directory

git clone https://github.com/huggingface/transformers.git

For further pre-training the language models:

python transformers/examples/language-modeling/run_language_modeling.py ,
        --output_dir='ML_MBERT_DAPT',
        --model_type=bert ,
        --model_name_or_path=bert-base-multilingual-cased-freeze-we,
        --do_train,
        --overwrite_output_dir,
        --train_data_file='train.txt',
        --do_eval,
        --block_size=256,
        --eval_data_file='vali.txt',
        --mlm"

#If you find this code useful, please let us know and cite our paper.

ltian678/cross-lingual-rumour

Rumour Detection via Zero-shot Cross-lingual Transfer Learning

Paper

Data and Resource

Dependencies

Adaptive Pretraining script: