/Multilabel-Text-Classification-BERT

Classifying Multilabel texts with BERT fine-tuned on PyTorch Lightning from a custom dataset

Primary LanguageJupyter Notebook

Classifying Multi-label texts with fine-tuned BERT & PyTorch Lightning

I have classified multi-label texts from a Kaggle Competition with PyTorch Lightning. This was done with the BERT-base model from the HuggingFace Transformers library and fine-tuned on the above dataset with Lightning.

🚀 The Result

BERT-base model fine-tuned on our custom dataset giving an average F1-score of 0.70. We know this is due to the model making mistakes on the tags with low samples.

✏ Tech Stack for Project Development

  • Python
  • 🤗 Transformers
  • PyTorch Lightning
  • Pandas
  • NumPy
  • Sklearn

🧠 Approach taken

  1. Tokenized text (with BERT tokenizer) and created PyTorch dataset
  2. Fine-tune BERT model with PyTorch Lightning
  3. Made predictions using the fine-tuned BERT model
  4. Evaluate the performance of the model for each class

🔗 Connect with me:

portfolio linkedin twitter