Given a title and description of the document develop a classification model on 100 topics (description of the task in russian).
For each document concatenate title and description and extract TfIdf features on words and char n-grams. Then train one-vs-the-rest (OvR) multiclass linear model with modified Huber loss and stochastic gradient descent method.
This will reach a f1 score 60.83 (see leaderboard).
pip install -r requirements.txt
python train.py