chengzeyi/stable-fast

What's the advance compared to TensorRT?

Closed this issue · 3 comments

Thank you for this great work. It's amazing that it could reach almost same performance with TensorRT on N-GPU!

However, is there a convincing reason of using it instead of TensorRT?

In fact, so many reasons!
Here are some:

  • Fastest compilation: Just cold start your whole pipeline in 10s, hundreds of times faster than TRT.
  • Better support for various models: Support all SD models even latest LCM and SD Turbo, Which TRT does not support well.
  • Open sourced and can be migrated to other platforms: AMD GPUs can also be supported in theory. This is what TRT will never do.
  • Full dynamic shape support: This is what TRT struggles at.

Dynamic shape and AMD support are absolutely attracting points!

A suggestion: would you like to add a benchmark compared with oneflow/diffuser? It declared even faster speed than TensorRT last year. It also worked on SDXL turbo recently and I heard that it has amazing performance.

BTW, any roadmap, contributing guide or community group here? I'm quite interested on this work and maybe participate in it in the future. :)

There should be a Discord group sometime🤔.
A detailed benchmark will be conducted soon on some most powerful hardware platform😎.