gpt-omni/mini-omni

Audio Generation is Slow

MarcoFerreiraPerson opened this issue · 2 comments

Hello,

Love what you guys have done, however when I run it on a RTX 4090 GPU or on an A100 and I get similar inference speeds for audio streaming. Any ideas on why this is the case?

I think the speech portion might be running on the CPU, do you guys have any suggestions on how to move it to GPU or any other solutions?

Thank you

yes, it's weird, but the speed of 3090 is also similar. But the is used in all the codes, so all the model should run on GPU, thank you!

hi, @MarcoFerreiraPerson, what do you mean the speech portion? We may check on that later.
For the inference speed, the current model is the original fp32 version, and we haven't performed in-depth inference optimization, so the speed is not optimal at present.