The following repository has News Articles scrapoped from the Times of India, from late 2017 to June 2018. The following dataet consists of over 1 million articles scrapped between this period.
The folliwing data has been then checked for keywords relating to WhatsApp related deaths which happened to be a growing concern at that period of time.
- The file
Data.csv
has the following files, with the date, place, and the keywirds mentioned. - The
webscrapper
consists of ascrapy
spider which can be used to extract files from the news site - The files
archivelist_finder.py
andextract_csv_data.py
can be used for reference in the process
After textfiles were preprocessed, keywords were found in the following dataset, which were selected to find news articles which had a good probabilty of being articles about Fake News. These were then crosschecked to see if the stories did correspond to them.
The following data was used by the BBC in order to help generate useful insights about Fake News
The complete file containing all articles in the form of .txt
files can be found at the following link. https://drive.google.com/file/d/19IbOlTO18BAXYRQoVkWfQ6paad4v9sfB/view?usp=sharing
https://zenodo.org/badge/latestdoi/157810232
Please do cite this repository if you happen to use this dataset in your research. 😃