Access to the script for evaluating the inference performance (average latency vs. model size)

Question

Access to the script for evaluating the inference performance (average latency vs. model size)

songkq opened this issue 2 years ago · 3 comments

@dimapihtar @JimmyZhang12 @Davood-M Hi, does the average latency shown in the Readme include the elapsed time during preprocessing and postprocessing? Could you please give a script for evaluating the inference performance?

Answer 1 · 2023-04-18T17:31:49.000Z

From the blame it looks like Piotr Marcinkiewicz at Nvidia was the one who generated these results (not sure what his github profile is)

Answer 2 · 2023-04-19T09:44:13.000Z

These results doesn't include preprocessing and post-processing.

I'm sorry, but performance results were obtained using tools designed for internal NVIDIA clusters so I can't share with you scripts for mT5.

FasterTansformer team published scripts for T5/mT5 benchmarks here:

https://github.com/NVIDIA/FasterTransformer/blob/main/benchmarks/t5/pyt_benchmark.sh

You need exact checkpoint, machine configurations to reproduce these results so it can be hard to reproduce exact numbers.

You can also check theirs results:
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#end-to-end-translation-performance-on-pytorch

Answer 3 · 2023-04-27T01:23:11.000Z

@piotrm-nvidia @JimmyZhang12 Thanks.