LiyuanLucasLiu/Transformer-Clinic

Understanding the Difficulty of Training Transformers

PythonApache-2.0

Issues

Position of residual connection in PreLN architecture is wrong
#27 opened 2 years ago by bilzard
1
How to get the beta_{i,j} for each residual branch?
#26 opened 2 years ago by SefaZeng
0
Admin for 100L-100L model？
#24 opened 3 years ago by Vincent131499
1
Ensemble models
#23 opened 3 years ago by Vincent131499
0
How to add Radam to fairseq ?
#22 opened 4 years ago by KelleyYin
1
argdict
#21 opened 4 years ago by riosempre
1
Reimplement Admin in new fairseq but get bad valid loss
#20 opened 4 years ago by moonscar
0
Question about the adaptive optimizer
#19 opened 4 years ago by chenwydj
1
Difference of implementation from the original paper
#18 opened 4 years ago by wade3han
1
`RuntimeError: expected scalar type Float but found Half` during the eval step
#17 opened 4 years ago by ruiningh
5
Scripts for Post-LN in Figure 10?
#16 opened 4 years ago by zhuchen03
1
Is wmt14en-fr.sh missing in pre-process dir?
#15 opened 4 years ago by lvzaihefang
1
wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.
#14 opened 4 years ago by sshleifer
8
tmp_weight is not defined
#13 opened 4 years ago by sshleifer
4
IWSLT'14 Results
#12 opened 4 years ago by villmow
1
Post-LN with 12-12 is trained ok, but 12-3 diverge
#9 opened 4 years ago by ZhenYangIACAS
9
How to make sure that only performing one step forward pass in profiling phase?
#8 opened 4 years ago by ZhenYangIACAS
1
is "tmp_weight" in transformer_layer.py useless?
#7 opened 4 years ago by zherowolf
3
Details of total batch size
#6 opened 4 years ago by luofuli
1
Do the embedding layer's layernorm parameters need to be reparameterized accordingly?
#5 opened 4 years ago by gotobelieve
1
Can I use a pre-trained model to initialize the model?
#4 opened 4 years ago by luofuli
1
Is the "attention_ratio_change" and "fc_ratio_change" trainable or not?
#3 opened 4 years ago by gotobelieve
2
whta's the meaning of 'adaptive-scale' argument?
#1 opened 4 years ago by gotobelieve
1