
Need a suggestion and confirmation regarding throughput calculation

Closed this issue · 2 comments

I have a followup question for a closed issue #541 . I am currently running BERT-Large Pretraining for 30,000 iterations, for each iteration I am calculating throughput (seq/sec) as global_batch_size/ elapsed_time_per_iteration. Once all the iterations are done, almost all the values of throughput for each iteration are similar, so what should I take as final throughput for this experiment ?

Also is this the right way to calculate throughput (seq/sec) ?, I am trying this experiment for below configurations:
1 Node * 1GPU
1 Node * 2 GPUs
1 Node * 4 GPUs
1 Node * 6 GPUs
1 Node * 8 GPUs
1 Node * 10 GPUs

Do I need to change anything for calculating throughput for above configurations, or will it be global_batch_size/ elapsed_time_per_iteration ?

Note that all the GPUs are H100s,micro batch size (Batch size per GPU) is 64.

Once all the iterations are done, almost all the values of throughput for each iteration are similar, so what should I take as final throughput for this experiment?

You can compute the average.

Do I need to change anything for calculating throughput for above configurations, or will it be global_batch_size/ elapsed_time_per_iteration ?

Global batch size / elapsed per-iteration time will give you the throughput in sequences / second, regardless of the configuration you use.

Thank you for your confirmation and suggestion.