/MT-BERT

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

MT-BERT

One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers