intel-analytics/ipex-llm

all-in-one tool for ChatGLM3-6b: next token latency with BS=16 is slower than before

Fred-cell opened this issue · 2 comments

ipex-llm version: 2.5.0b20240510
image

Please enable low memory mode and check if this issue still exists. You could use export IPEX_LLM_LOW_MEM=1 to enable low memory mode.

Duplicate with: #10994
Closing this issue and will update in that one.