all-in-one tool for ChatGLM3-6b: next token latency with BS=16 is slower than before
Fred-cell opened this issue · 2 comments
Fred-cell commented
lalalapotter commented
Please enable low memory mode and check if this issue still exists. You could use export IPEX_LLM_LOW_MEM=1
to enable low memory mode.