/Question-Answering-with-BERT-and-Knowledge-Distillation

Fine-tuned BERT on SQuAd 2.0 Dataset. Applied Knowledge Distillation (KD) and fine-tuned DistilBERT (student) using BERT as the teacher model. Reduced the size of the original BERT by 40%.

Primary LanguageJupyter NotebookMIT LicenseMIT

Stargazers