/TC32

Text Classification Dataset for Turkish Language

TC32 : Multi Class Classification Dataset for Turkish

Text Classification Dataset for Turkish Language

  • Benchmark dataset for Turkish text classification
  • It contians 430K lines, 32 categories
  • Each category roughly has 13K comments
  • Data is collected from Turkish web sites
  • the data contains the comments of the products and product categories
  • Baseline algoritm , Naive Bayes gets %84 F1 score as follows

Download Link https://www.kaggle.com/savasy/multiclass-classification-data-for-turkish-tc32