Fake News Detection: It's All in the Data!

Welcome to the Fake News Research Datasets repository. This repository is part of our paper contribution, where we re-upload publicly available datasets, summarize their contents, and compare them. This initiative aims to provide researchers with a centralized, comprehensive portal for accessing and analyzing relevant datasets, with regular updates.

Note: This repository is currently private and will be made public after the paper's acceptance. It will be provided as supplementary material.

Table of Contents

Introduction

In this section, we contribute to this paper by re-uploading publicly available datasets, summarizing their contents, and comparing them on our GitHub page. This initiative aims to offer researchers a centralized, comprehensive portal for accessing and analyzing relevant datasets, with regular updates. Due to page constraints, only a portion of the GitHub pages are displayed here.

Datasets

BuzzFace

Description: Dataset from social media, focusing on fake news and hoaxes.

Contents:

  • Number of articles: Information not specified
  • Source types: Social media platforms

Download: Link to BuzzFace (This link will be active upon repository's public release)

CREDBANK-data

Description: A crowd-sourced dataset of events with credibility annotations.

Contents:

  • Number of events: 60 million
  • Source types: Social media platforms

Download: Link to CREDBANK-data (This link will be active upon repository's public release)

EMERGENT

Description: Dataset of claims and their respective journalistic assessments.

Contents:

  • Number of claims: 300
  • Source types: News websites

Download: Link to EMERGENT (This link will be active upon repository's public release)

FCV-2018

Description: Dataset for fake news detection collected in 2018.

Contents:

  • Number of articles: Information not specified
  • Source types: News websites, Social media platforms

Download: Link to FCV-2018 (This link will be active upon repository's public release)

FEVER

Description: Fact Extraction and VERification dataset.

Contents:

  • Number of claims: 185,445
  • Source types: Various textual sources

Download: Link to FEVER (This link will be active upon repository's public release)

FacebookHoax

Description: Dataset focusing on hoaxes and fake news spread on Facebook.

Contents:

  • Number of articles: Information not specified
  • Source types: Facebook posts

Download: Link to FacebookHoax (This link will be active upon repository's public release)

FakeNewsNet

Description: A comprehensive data repository for fake news research.

Contents:

  • Number of articles: Information not specified
  • Source types: News websites, social media platforms

Download: Link to FakeNewsNet (This link will be active upon repository's public release)

Fakeddit

Description: Dataset for fake news detection with multiple classes of fake news.

Contents:

  • Number of articles: 1 million
  • Source types: Reddit posts

Download: Link to Fakeddit (This link will be active upon repository's public release)

LIAR

Description: A benchmark dataset for fake news detection.

Contents:

  • Number of statements: 12,836
  • Source types: PolitiFact statements

Download: Link to LIAR (This link will be active upon repository's public release)

M4

Description: Multimodal dataset for fake news detection.

Contents:

  • Number of articles: Information not specified
  • Source types: News websites, social media platforms

Download: Link to M4 (This link will be active upon repository's public release)

MisInfoText

Description: Text-based dataset for misinformation research.

Contents:

  • Number of articles: Information not specified
  • Source types: Various textual sources

Download: Link to MisInfoText (This link will be active upon repository's public release)

NELA-GT-2018

Description: A large dataset for misinformation research collected in 2018.

Contents:

  • Number of articles: 713,000
  • Source types: News websites

Download: Link to NELA-GT-2018 (This link will be active upon repository's public release)

Verification-corpus

Description: Dataset for claim verification research.

Contents:

  • Number of claims: Information not specified
  • Source types: Various sources

Download: Link to Verification-corpus (This link will be active upon repository's public release)

benjamin-political-news-dataset

Description: Political news dataset collected for research purposes.

Contents:

  • Number of articles: Information not specified
  • Source types: News websites

Download: Link to benjamin-political-news-dataset (This link will be active upon repository's public release)

buzzfeed

Description: Dataset collected from BuzzFeed news articles.

Contents:

  • Number of articles: Information not specified
  • Source types: News websites

Download: Link to buzzfeed (This link will be active upon repository's public release)

pheme

Description: Dataset focusing on rumors and fake news.

Contents:

  • Number of articles: Information not specified
  • Source types: Social media platforms

Download: Link to pheme (This link will be active upon repository's public release)

Usage

Installation

To use these datasets, clone the repository:

git clone https://github.com/fakenewsresearch/dataset.git
<<<<<<< HEAD

=======
>>>>>>> 465753a7ba34c43d65f8eab88435b6e941f54755