chorus12/bert-finetuning-catalyst

Code for BERT classifier finetuning for multiclass text classification

Python

Instruction:

specify your data, model, and training parameters in config.yml
if needed, customize the code for data processing in src/data.py
specify your model in src/model.py, by default it's DistilBERT for sequence classification
run python src/train.py

Video-tutorial

I explain the pipeline in detail in a video-tutorial which consists of 4 parts:

Intro: overview of this pipeline, introducing the classification task + overview of the previous talk Firing a cannon at sparrows: BERT vs. logreg
Data preparation for training: from CSV files to PyTorch DataLoaders
The model: understanding the BERT classifier model by HuggingFace, digging into the code of the transformers library
Training: running the pipeline with Catalyst and GPUs

Also, see other tutorials/talks on the topic:

multi-class classification: classifying Amazon product reviews into categories, Kaggle Notebook
multi-label classification: identifying toxic comments, Kaggle Notebook
an overview of this pipeline is given in a video Firing a cannon at sparrows: BERT vs. logreg