hidet-org/hidet

BART, Pegasus, GPT2 model benchmarks are slower compared to vanilla ORT

varshith15 opened this issue · 4 comments

Hey @yaoyaoding!
First of all, amazing work with Hidet!
I have recently been experimenting with hidet to see if it can outperform ORT.
Surprisingly, ORT with IO binding on an ONNX graph(BART, Pegasus, GPT2) without any graph optimisations outperforms the hidet's optimised flow graph even with a search space 2. (on Nvidia A100)
Did you previously run any benchmark comparisons between hidet and ORT? I would love to help debug this!

Also, I have experimented with transformer-deploy, which performs better than vanilla ORT and hidet. Replicating optimisations from transformer-deploy is a good next step. I would love to help with this as well!

Hi @varshith15,

Thanks for your interest in hidet.

Could you please provide your benchmark script so that we can have a look?

Hi @yaoyaoding, Sorry for the delay. Please check this notebook
try using ghcr.io/els-rd/transformer-deploy:0.5.4 docker, it already has transformer_deploy package installed

Hi @varshith15,

Thanks for the script! We are still working on optimizing gpt models with the fp16 data type and the fused attention operator, and have made good progress. We will release the new version in the next couple of weeks. Please stay tuned!

Hey @yaoyaoding
That's nice to hear, looking forward to the release. I'd love to contribute, let me know if I can help in any capacity!
Thanks