
Fake news headlines in the Tamil regional language.


  • This Tamil fake news text corpus has 5,273 rows
  • The data was collected automatically from various verified websites by using web scraping tools
  • Download the dataset here: Dataset

Corpus Statistics

News Count
Fake 2949
Real 2324
Total 5273


  • The Corpus has data from the following domains:
Domain Label Count
Politics politics 1674
Miscellaneous (individual opinions, political) miscellaneous 1521
Business/Science tech 966
Entertainment entertainment 589
Sports sport 476

Baseline Models

Model Accuracy
Support Vector Machine 87.85%
Logisitic Regression 86.80%
Naive Bayes 85.46%
XG-Boost 85.08%
RNN (2 LSTM layers) 75.04%


14/11/2022: Our paper titled A Novel Dataset for Fake News Detection in Tamil Regional Language has been accepted for SPELLL 2022!

03/06/2023: Read our paper here: Speech and Language Technologies for Low-Resource Languages