amsgrad parameter
nicolaspanel opened this issue · 3 comments
nicolaspanel commented
Hi @CyberZHG and TY for sharing this !
Have you run some experiments with amsgrad=True
?
If so, have you notice significant improvement compared to RAdam+warmup alone ?
Best regards
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
stale commented
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.