very high CPU during inference. GPU seems to be idle.

Question

xuduo18 opened this issue 2 years ago · 1 comments

I have tried the 8bit option as well but no change.

It generates tokens slowly and CPU goes high (>80%). GPU jumps up too but always < 20%. So it seems to be CPU hungry instead of GPU.

So by default does it inference on GPU?

Answer 1 · 2023-04-22T03:23:43.000Z

This seems to be a problem with int8. In our test, it is indeed slower than fp16. We'll have an investigation into this.