Crash when two messages are sent before input is generated
heriklesDM opened this issue · 0 comments
heriklesDM commented
Hello
I'm running the model locally with a llama 7b model using llama-cpp-python compiled with cublas (gpu is working).
Whenever two messages are sent by a user before the AI sends a response, the whole program crashes.
This might be fixed by queueing messages and generating messages one after the other, or by simply ignoring new messages while creating.