India WhatsApp Fake News Dataset

The following repository has News Articles scrapoped from the Times of India, from late 2017 to June 2018. The following dataet consists of over 1 million articles scrapped between this period.

The folliwing data has been then checked for keywords relating to WhatsApp related deaths which happened to be a growing concern at that period of time.

Details

The file Data.csv has the following files, with the date, place, and the keywirds mentioned.
The webscrapper consists of a scrapy spider which can be used to extract files from the news site
The files archivelist_finder.py and extract_csv_data.py can be used for reference in the process

Labelling Data and Insights

After textfiles were preprocessed, keywords were found in the following dataset, which were selected to find news articles which had a good probabilty of being articles about Fake News. These were then crosschecked to see if the stories did correspond to them.

The following data was used by the BBC in order to help generate useful insights about Fake News

The complete file containing all articles in the form of .txt files can be found at the following link. https://drive.google.com/file/d/19IbOlTO18BAXYRQoVkWfQ6paad4v9sfB/view?usp=sharing

https://zenodo.org/badge/latestdoi/157810232

Please do cite this repository if you happen to use this dataset in your research. 😃

pratikkayal/India-WhatsAppFakeNews-Dataset

India WhatsApp Fake News Dataset

Details

Labelling Data and Insights