The results and datasets are a part of my dissertation. It will be published soon.

BERTurk Performance Analysis on Text Classification and Question Answering Tasks in Turkish Datasets

The datasets that are used in this project were trained in order to be used in text classification and question answering tasks by using the BERTurk model and Colab platform. The obtained results are published in this repository.

The datasets were cleaned and standardized and divided into training (70%), validation (20%), and testing (10%). In addition, the character and word counts of each input were calculated to be used in visual analysis, and the elements of the sentence were extracted with the Zemberek tool and included in the datasets.

You can find all fine-tuned models on Huggingface.

Question Detection Datasets

Dataset	Best Model	Accuracy	Precision	Recall	F1
Dialog Dataset	ConvBERTurk	0.958773	0.951311	0.892570	0.921005
Quora Dataset	ELECTRA Base	0.959178	0.952355	0.893072	0.921762
Tweet Dataset	ELECTRA Base	0.788375	0.790655	0.788375	0.787725

Question Answering Datasets

Dataset	Best Model	Exact Match	F1
TQuad Dataset	ELECTRA Base	61.5385	80.3351
YTU Dataset	ELECTRA Base	65.0746	82.9919

izzetkalic/botcuk-dataset-analyze

BERTurk Performance Analysis on Text Classification and Question Answering Tasks in Turkish Datasets

Question Detection Datasets

Question Answering Datasets