/fake-news-detection

Fake News Detection: Model implementations and Hyper-Parameters

Primary LanguageJupyter Notebook

Code and HyperParameters for the QICC Fake News Competition

This repository contains source code to all the code used in the Paper [LINK(coming soon)] and in the QICC fake news competition by Team FAR-NLP and Team AI Musketeers

Setup

For the Transformer-Based code, we used using Google Colab with a GPU accelerated instance. All transformers are based on HuggingFace's implementation.

The code is left as is from the competition with minor editing.

Contributors

Hyper-Parameters

Table 1: Fake News Detection Hyper-Parameters

Models Hyper-Parameters
NB smoothing parameter=10
SVM penalty parameter=21, kernel= RBF
RF estimators=271
XGBoost estimators= 10, learning rate=1, gamma=0.5
mBERT-base,
XLNET-base,
RoBERTa-base
MAXSEQLN:128,
LR:2e-5,
BATCHSIZE:32,
EPOCHS: up to 5

Table 2: News Domain Detection Hyper-Parameters

Models Hyper-Parameters
TF-IDF 1 to 4-gram and 1 to 6-gram
CNN* EMBSIZE:300,
2-stacked CNNs
256 kernels of size 5
64 kernels of size 5
dropout: 0.1
Epochs: max of 40
3CCNN* EMBSIZE:300
512 kernels of size 3,4 and 5
dropout: 0.3
Epochs: max of 40
LSTM* EMBSIZE:300
Hidden size: 300
Dropout: 0.05
Epochs: max of 40
GRU* Same as LSTM
Bi-LSTM* Same as LSTM
Bi-LSTM with attention* Same as LSTM
Bi-LSTM with attention** Same as LSTM
mBERT-base,
XLNET-base,
RoBERTa-base
MAXSEQLN:128,
LR:2e-5,
BATCHSIZE:32,
EPOCHS: up to 5
RMDL EMBSIZE:50
MAXSEQUENCELENGTH : 500
MAXNBWORDS : 5000
Combination of (10 DNNs,10 RNNs,10 CNNs)
epochs : 100 each
dnn: default parameters except for
maxnodesdnn : 512
rnn & cnn : default parameters
Adam optimizer
dropout : 0.07

Files Description

  • Transformers-Team_AI_Musketeers/
    • Data_preperation.ipynb: Notebook used for dataset preperation for the transformer based models
    • Fake_News_BERT_RoBERTa.ipynb and Fake_News_Model_XLNet.ipynb: Notebooks used for transformer model training on the Fake News Dataset on google colab.
    • Fake_News_Models_with_BERTViz.ipynb: Testing the Bert Visualization tool for better interpretability
    • News_Domain_BERT_RoBERTa_XLNet.ipynb: Notebook used training the transfomer models on the domain Identidication task on google colab.
    • News_Domain_ML.ipynb: Notebook used for to evaluate other models on the news domain identification task
    • modeling.py: Extended the functionality of the BERT, XLNET and RoBERTa to support multilabel classification
    • multiutils.py: Data preperation and creating the fine-tunnning data for the multilabel classification
    • utils.py: Data preperation and creating the fine-tunnning data for binary classification
  • Feature_based-Team_FAR_NLP/
    • fake_news/
      • google_search.py: Google search API access code for search result extraction
      • preprocessing.py: Preprocessing source code the fake news arcticles
    • topic/
      • get_entities.py: Google cloud Language API access code for extracting the entities from the text
      • model.py: Pytorch-based Bi-LSTM with Attention Model
      • train_topic.py: Training script
  • Twitter_bot.ipynb: Notebook with the preproccessing, and model used for the twitter bot detection task

Paper:

link hopefully coming soon