How do we enable samples per second metric in NeMo training?
a-cavalcanti opened this issue · 2 comments
a-cavalcanti commented
The training only shows this information.
Epoch 0: 1%| | 795/77225 [9:07:58<878:01:48, 41.36s/it, loss=3.1, v_num=, reduced_train_loss=3.090, global_step=794.0, consumed_samples=1.63e+6]
How do we enable some throughput metrics?
ayrnb commented
tensorboard
ethanhe42 commented
enable wandb, it shows iteration time
https://github.com/NVIDIA/NeMo-Megatron-Launcher/blob/master/launcher_scripts/conf/training/gpt3/1b_improved.yaml#L33