intersun/PKD-for-BERT-Model-Compression
pytorch implementation for Patient Knowledge Distillation for BERT Model Compression
Python
Issues
- 0
论文公式3是交叉熵公式,但为什么代码用的KL散度来实现的公式3?
#14 opened by jinxinglu - 5
请问一个问题
#10 opened by Smile-shirley - 0
- 0
Some questions about layer number (model size)
#12 opened by ZLKong - 0
Not able to reproduce results
#11 opened by ashim95 - 3
Reproducing results
#9 opened by pawankmrs - 1
Trying to do distillation for regression task
#8 opened by smr97 - 2
How do I run student predictions?
#7 opened by smr97 - 2
RuntimeError: Function AddBackward0 returned an invalid gradient at index 1 - expected type torch.cuda.HalfTensor but got torch.cuda.FloatTensor
#6 opened by yg33717 - 1
question on pretrained/bert_config.json
#3 opened by Seohyeong - 2
Result is different...
#5 opened by jdh3577 - 1
- 2
- 1
Where to download the pretrained weights?
#2 opened by thudzj