project-baize/baize-chatbot

very high CPU during inference. GPU seems to be idle.

xuduo18 opened this issue · 1 comments

I have tried the 8bit option as well but no change.

It generates tokens slowly and CPU goes high (>80%). GPU jumps up too but always < 20%. So it seems to be CPU hungry instead of GPU.

So by default does it inference on GPU?

image

This seems to be a problem with int8. In our test, it is indeed slower than fp16. We'll have an investigation into this.