all-in-one tool for ChatGLM3-6b: next token latency with BS=16 is slower than before

Question

Fred-cell opened this issue a month ago · 2 comments

ipex-llm version: 2.5.0b20240510

Answer 1 · 2024-05-14T02:35:39.000Z

Please enable low memory mode and check if this issue still exists. You could use export IPEX_LLM_LOW_MEM=1 to enable low memory mode.

Answer 2 · 2024-05-14T05:39:41.000Z

Duplicate with: #10994
Closing this issue and will update in that one.