/FakeNewsCorpusFiles

A repo containing helper files for the Fake News Corpus

Apache License 2.0Apache-2.0

Fake News Corpus Files

These are files intended for the Data Science module at University of Copenhagen. The content is a subset of the Fake News Corpus. These files are only meant as a help so you don't have to download the whole dataset (about 30 GB unzipped) in order to start working with it.

Files

  • 1mio-raw.zip: Contains the first 1 mio. articles in Fake News Corpus unaltered. About 1.15 GB big.
  • clean-100k.zip: Contains the first 100k articles in Fake News Corpus cleaned. About 134 MB big.

Cloning the Repo

This repo uses Git Large File Storage. You need to have this installed on your system to clone the repo successfully. To install:

$ git lfs install

Alternatively, you can simply download the files directly from the website.

Acknowledgements