Access to the script for evaluating the inference performance (average latency vs. model size)
songkq opened this issue · 3 comments
@dimapihtar @JimmyZhang12 @Davood-M Hi, does the average latency shown in the Readme include the elapsed time during preprocessing and postprocessing? Could you please give a script for evaluating the inference performance?
From the blame it looks like Piotr Marcinkiewicz at Nvidia was the one who generated these results (not sure what his github profile is)
These results doesn't include preprocessing and post-processing.
I'm sorry, but performance results were obtained using tools designed for internal NVIDIA clusters so I can't share with you scripts for mT5.
FasterTansformer team published scripts for T5/mT5 benchmarks here:
https://github.com/NVIDIA/FasterTransformer/blob/main/benchmarks/t5/pyt_benchmark.sh
You need exact checkpoint, machine configurations to reproduce these results so it can be hard to reproduce exact numbers.
You can also check theirs results:
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/t5_guide.md#end-to-end-translation-performance-on-pytorch
@piotrm-nvidia @JimmyZhang12 Thanks.