/za-fake-news-2020

Dataset of South African Disinformation [Fake News] Website Data collected in 2020

MIT LicenseMIT

South African Disinformation [Fake News] Website Data - 2020

Give Feedback 📑: DSFSI Resource Feedback Form

Dataset Information

We used, as sources, investigations by the news websites MyBroadband (https://mybroadband.co.za/forum/threads/list-of-known-fake-news-sites-in-south-africa-and-beyond.879854/) and News24 (https://exposed.news24.com/the-website-blacklist/). These articles covered investigations into disinformation websites in South Africa in 2018. They compiled lists of websites that were suspected to be disinformation. During the period from those articles to present, a number of the websites have become inaccessible or offline. We attempted to use the internet archives WayBack Machine we could only get partial snapshots and error messages.

A web-scraper only worked for one of the sources although manual editing was still required to clean the text from Javascript code and some paragraph duplicates. On most of the other websites, a web-scraper did not work well as there were too many advertisements and broken parts of pages. Because of all these problems, most of the articles were manually copied and pasted and cleaned in flat files. In some cases, the text of articles could not be copied and was not made part of the South African disinformation corpus.

Online Repository link

Authors

  • Harm De Wet
  • Vukosi Marivate - @vukosi

See also the list of contributors who participated in this project.

Citing the dataset

@inproceedings{de2021fake, title={Is it Fake? News Disinformation Detection on South African News Websites}, author={de Wet, Harm and Marivate, Vukosi}, booktitle={2021 IEEE AFRICON}, pages={1--6}, year={2021}, organization={IEEE} }

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • Media Monitoring Africa