layer6ai-labs/T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
PythonMIT
Issues
- 1
Details for initializing FFN (MLP blocks)?
#5 opened by zhuchen03 - 1
- 1
- 0
Does adding layer norm together with t-fixup makes the model even better or does t-fixup make layer norm completely unnecessary (i.e. no performance gain)?
#7 opened by yxchng - 1
T-Fixup for Language Modeling
#6 opened by sairams-intel - 1
- 2
FP16 Training
#1 opened by libeineu - 1
Possible minor typo in the paper.
#2 opened by kdexd