Amblyopius/Stable-Diffusion-ONNX-FP16

THIS IS actually SLOW than torch and tensorrt

lucasjinreal opened this issue · 4 comments

I didn't notice any value to do it in onnxruntime, both running time and memory footprint are not positive to me.

Hi, if you read the documentation you should have noticed it never claimed to be faster than either. You're simply not part of the target audience. It's nice that you tried it but I'm a bit confused as to why you did.

Quick sample of things that have been faster than this for months:
Torch + CUDA (Windows / Linux)
Torch + ROCm (Linux)
ONNX Runtime GPU using CUDA (Linux / Windows)
ONNX Runtime GPU using ROCm (Linux)
NOD.ai SHARK using MLIR/IREE/Vulkan (Linux / Windows)

But you may also notice the project is a few months old. As such these are the reasons it was created/used:

  • It used to be the fastest method for AMD Cards on Windows
  • It is generally still the more stable method for AMD Cards on Windows as Shark breaks more often
  • It offered many features before Shark did
  • It is still one of the more convenient methods for AMD cards with limited VRAM as ONNX Runtime DirectML has more forgiving methods to deal with limited VRAM than some other implementations
  • If someone wants to use another ONNX Runtime implementation they can reuse the ControlNet code
  • If you want to improve the conversion to ONNX and subsequent conversion to TensorRT this would provide a starting point

PS: Probably would've made more sense if you opened this as a discussion as you do not have an actual issue

@Amblyopius In my opinion, when you say something optimize, most people would thought it actually optimal isn't it?

I am not here to say optimum is not good, its a great lib actually, am just ask does it normal or not, at least gives me some advise.

Also, onnxruntime from micosoft actually have post their result from onnxruntime's repo, which shows supressed speed compare with torch 2.0.

If it possible, it would be better make user know what's your optimize meaning..

Hi, I'm not really sure what context you are posting in. The word optimize nor any derivatives is used in the documentation of the repository you're currently commenting on ...

Aside from that, I would generally disagree that optimize implies it is optimal across the board. The goal would be to optimize within a specific context. If you ask someone to optimize your car, would you be angry because it is still slower than a race car?

Note that if you have this repository confused with optimum, the optimum repo is at: https://github.com/huggingface/optimum