pmichel31415/are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

ShellMIT

Issues

RuntimeError: can't retain_grad on Tensor that has requires_grad=False
#7 opened 4 years ago by YJiangcm
2
Is the code still able to run?
#9 opened 3 years ago by bing0037
3
Why do we need different normalization for all the layers compared to the last layer in BERT during importance score calculation?
#11 opened 2 years ago by Hritikbansal
1
Is BERT finetuned after pruning?
#10 opened 2 years ago by Huan80805
2
about the params: --raw-text and --transformer-mask-heads
#8 opened 4 years ago by LiangQiqi677
0
Systematic Pruning Experiments Problem
#6 opened 4 years ago by ChuanyangZheng
0
a question about run_classifier.py
#5 opened 4 years ago by Ixuanzhang
1
Not able to obtain pretrained WMT model
#4 opened 5 years ago by marwash25
2
Not able to prune the BERT model
#2 opened 5 years ago by ishita1995
7
BERT actually_prune option not working
#3 opened 5 years ago by pglock
1
No code on master?
#1 opened 5 years ago by aninrusimha
1