/MGSKD

Primary LanguagePython

MGSKD

This repository contains the implementation of our proposed knowledge distillation method MGSKD (i.e., Multi-Granularity Structural Knowledge Distillation for Language Model Compression, ACL 2022).

Our method is implemented mainly based on TinyBERT. We initialize our student model with the generally distilled TinyBERT-4-312 and distill the student on the same augmented datasets. We provide the checkpoints after MGSKD distillation for further evaluation. One can directly fine-tune them on GLUE or further distill them by the teacher's prediction distribution for evaluation.