BART, Pegasus, GPT2 model benchmarks are slower compared to vanilla ORT

Question

BART, Pegasus, GPT2 model benchmarks are slower compared to vanilla ORT

varshith15 opened this issue a year ago · 4 comments

Hey @yaoyaoding!
First of all, amazing work with Hidet!
I have recently been experimenting with hidet to see if it can outperform ORT.
Surprisingly, ORT with IO binding on an ONNX graph(BART, Pegasus, GPT2) without any graph optimisations outperforms the hidet's optimised flow graph even with a search space 2. (on Nvidia A100)
Did you previously run any benchmark comparisons between hidet and ORT? I would love to help debug this!

Also, I have experimented with transformer-deploy, which performs better than vanilla ORT and hidet. Replicating optimisations from transformer-deploy is a good next step. I would love to help with this as well!

Answer 1 · 2023-05-19T01:01:09.000Z

Hi @varshith15,

Thanks for your interest in hidet.

Could you please provide your benchmark script so that we can have a look?

Answer 2 · 2023-05-22T18:34:14.000Z

Hi @yaoyaoding, Sorry for the delay. Please check this notebook
try using ghcr.io/els-rd/transformer-deploy:0.5.4 docker, it already has transformer_deploy package installed

Answer 3 · 2023-05-23T23:08:43.000Z

Hi @varshith15,

Thanks for the script! We are still working on optimizing gpt models with the fp16 data type and the fused attention operator, and have made good progress. We will release the new version in the next couple of weeks. Please stay tuned!

Answer 4 · 2023-05-24T14:32:59.000Z

Hey @yaoyaoding
That's nice to hear, looking forward to the release. I'd love to contribute, let me know if I can help in any capacity!
Thanks