pmichel31415/are-16-heads-really-better-than-1
Code for the paper "Are Sixteen Heads Really Better than One?"
ShellMIT
Issues
- 2
- 3
Is the code still able to run?
#9 opened by bing0037 - 1
Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?
#11 opened by Hritikbansal - 2
Is BERT finetuned after pruning?
#10 opened by Huan80805 - 0
- 0
Systematic Pruning Experiments Problem
#6 opened by ChuanyangZheng - 1
a question about run_classifier.py
#5 opened by Ixuanzhang - 2
Not able to obtain pretrained WMT model
#4 opened by marwash25 - 7
Not able to prune the BERT model
#2 opened by ishita1995 - 1
BERT actually_prune option not working
#3 opened by pglock - 1
No code on master?
#1 opened by aninrusimha