/VLSP2020-Fake-News-Detection

Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task

Primary LanguagePython


VLSP2020: Fake News Detection

Fine-tune a variety of pre-trained Transformer-based models to solve Vietnamese Reliable Intelligent Identification (ReINTEL) problem in VLSP2020 shared task.

About The Project

In this project, we utilize the effectiveness of the different pre-trained language models such as vELECTRA, vBERT, PhoBERT, Bert Multilingual Cased, XLM-RoBERTa to identify reliable information shared on social network sites.

We evaluate the different input length models, it includes 256, 512, and multiple 512 (long document)

Prerequisites

To reproduce the experiment of our model, please install the requirements.txt according to the following instructions:

  • huggingface transformer
  • emoji
  • vncorenlp
  • nltk
  • pytorch
  • python3
pip install -r requirements.txt

Data

The dataset is provided by VLSP2020 Organizers. Please access this site for more information.

Contact

Hieu Tran - heraclex12@gmail.com

Project Link: https://github.com/heraclex12/VLSP2020-Fake-News-Detection

Citation

@misc{tran2020leveraging,
      title={Leveraging Transfer Learning for Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL)}, 
      author={Trung-Hieu Tran and Long Phan and Truong-Son Nguyen},
      year={2020},
      eprint={2012.07557},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgements