Hard release criteria: Run and get convergence data on long running tests
gnadathur opened this issue · 2 comments
gnadathur commented
Hard release criteria: Run and get convergence data on long running tests
gnadathur commented
- Run on 64 A100
- Later on 64 H100
gnadathur commented
What are the hyper parameters for convergence run ?
- adjusted batch size to 1.
- What should the learning rate be ? @wanchaol , @lessw2020 , maybe duplicate the earlier convergence tests from FSDP1.