Inferential capability of qwen.cpp for Qwen-14b-chat is different compared with Qwen-14b-chat of CUDA

Question

Inferential capability of qwen.cpp for Qwen-14b-chat is different compared with Qwen-14b-chat of CUDA

wertyac opened this issue a year ago · 0 comments

we run Qwen-14b-chat-int4 on qwen.cpp. And ask the same question of the CUDA version. Howerver, qwen.cpp return the wrong answer. But the CUDA version is OK. So with the qwen.cpp the LLM is declined.