LiyuanLucasLiu/Transformer-Clinic
Understanding the Difficulty of Training Transformers
PythonApache-2.0
Issues
- 1
- 0
- 1
Admin for 100L-100L model?
#24 opened by Vincent131499 - 0
Ensemble models
#23 opened by Vincent131499 - 1
How to add Radam to fairseq ?
#22 opened by KelleyYin - 1
- 0
- 1
Question about the adaptive optimizer
#19 opened by chenwydj - 1
- 5
`RuntimeError: expected scalar type Float but found Half` during the eval step
#17 opened by ruiningh - 1
Scripts for Post-LN in Figure 10?
#16 opened by zhuchen03 - 1
Is wmt14en-fr.sh missing in pre-process dir?
#15 opened by lvzaihefang - 8
wmt_en_de admin: Function 'SoftmaxBackward' returned nan values in its 0th output.
#14 opened by sshleifer - 4
tmp_weight is not defined
#13 opened by sshleifer - 1
IWSLT'14 Results
#12 opened by villmow - 9
- 1
How to make sure that only performing one step forward pass in profiling phase?
#8 opened by ZhenYangIACAS - 3
- 1
Details of total batch size
#6 opened by luofuli - 1
Do the embedding layer's layernorm parameters need to be reparameterized accordingly?
#5 opened by gotobelieve - 1
- 2
- 1