/Fake-news-detection

Fake news detection based on Chinese and English corpus using CNN and four traditional machine learning algorithms.

Primary LanguageJupyter Notebook

Introduction

  • The experiments are seperated into two parts, one is based on the English corpus and another is based on Chinese one.
  • Five different algorithms are deployed to make comparison, one is convolution neural network, and other four are traditional machine learning approaches. Besides, data visualization approaches are used to illustrate the result of diversal models.
  • Codes of cnn model is based on NLPproject_6.ipynb by siddarthhair95

Dataset

  • Chinese corpus: Data grabbed from Weibo. Chinese_Rumor_Dataset
  • English corpus: Data are from William Yang Wang, "Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection. 12.8K short texts are abstracted from PolitiFact.com, which provides detailed analysis report and links to source documents for each case. Liar, liar, pants fire

Environment

  • Langueage: Python 3.7
  • IDE: jupyter notebook
  • Pre-trained Word Embedding: GloVe.6B.100d.txt, sgns.weibo.bigram-char.bz2
  • Database Tookit: Pickle
  • Machine Learning Toolkit: Scikit-learn
  • Deep Learning Toolkit: Keras, Tensorflow