I am using the AG News corpus for training a ULMFit (https://arxiv.org/abs/1801.06146) model to classify the news articles into
- World - Category 1
- Sports - Category 2
- Buisness - Category 3
- Sci/Tech - Category 4
You can read about the AG News Corpus here http://xzh.me/docs/charconvnet.pdf
I trained models with three below variants of data fields
- Title and Description
- Only Description
- Only Title
Language Model Accuracy:
The language model's accuracy for the three variants are as follows
- Title and Description - 43.4%
- Only Description - 46.2%
- Only Title - 45.5%
Classifier Accuracy:
The classifier's accuracy for the three variants are as follows
- Title and Description - 92%
- Only Description - 93%
- Only Title - 85%
The accuracy for point 2 is slightly better than of point 1.
The accuracy for point 3 is the lowest as the title of news are mostly cryptic in nature and determining the category from the same is difficult.
Confusion Matrix:
Confusion matrix of the three variants are below:
- Title and Description
- Only Description
- Only Title
Due to space constraints, I have not pushed the models in the repository.