Text categorization experiments for automatic fake news detection.
This repository contains the Python code for the experiments described in the paper:
Z. Bodó. Fake news detection without external knowledge. Submitted to MDIS 2020.
usage: test.py [-h] --dataset DATASET --method METHOD [--text TEXT]
[--gpu GPU]
optional arguments:
-h, --help show this help message and exit
--dataset DATASET, -d DATASET
Dataset name.
Available datasets:
0 - FNAMT (FakeNewsAMT)
1 - Celeb (Celebrity)
2 - BF (BuzzFeed)
3 - Rand (Random)
4 - MD (McIntire's dataset)
5 - UTK (UTK-MLC)
6 - GC (GossipCop)
7 - PF (PolitiFact)
--method METHOD, -m METHOD
Method.
Available methods:
0 - baseline binary
1 - baseline freq
2 - baseline tf-idf
3 - uni+bigrams binary
4 - new tokenizer
5 - chi2 5000
6 - stat. features
7 - tweak1
8 - tweak2
9 - c-cnn
10 - w-cnn
11 - lstm
12 - a-blstm
13 - gcn
--text TEXT, -t TEXT Text data to use (default=2).
0 - article
1 - title
2 - title+article
--gpu GPU, -g GPU Use GPU if available (default=1 [true]).
The datasets were preprocessed as needed for our methods (can be found in the data
subdirectory),
but one can find the links to the original corpora in the paper.
The range 0-8 of the methods represents bag-of-words models, while 9-13 are neural network models for text categorization. For the description of the methods see the paper and/or source code.
For graph convolutional networks the normalized symmetric adjacency matrix (A
) needs to be calculated and saved.
We uploaded the pre-computed matrices only for the first 4 datasets (because of its size); these are stored in the gcn
subfolder.