Crash when two messages are sent before input is generated

Question

heriklesDM opened this issue 10 months ago · 0 comments

Hello
I'm running the model locally with a llama 7b model using llama-cpp-python compiled with cublas (gpu is working).

Whenever two messages are sent by a user before the AI sends a response, the whole program crashes.

This might be fixed by queueing messages and generating messages one after the other, or by simply ignoring new messages while creating.