InftyAI/llmlite

vLLM not working as expected with ChatGLM2

kerthcet opened this issue · 0 comments

When chatting with ChatGLM2 via vLLM, we only got few messages, e.g.

result = chat.completion(
...     messages=[
...         [
...             ChatMessage(role="user", content="**共有多少人口?"),
...         ],
...         [
...             ChatMessage(role="user", content="**首富是谁"),
...         ],
...         [
...             ChatMessage(role="user", content="如何在三年内成为**首富"),
...         ],
...     ],
...     temperature=0.7,  # You can also overwrite the configurations in each conservation.
...     max_tokens=2048,
... )
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.31it/s]
>>> print(result)
[' 根据2021年**国家统计局发布的数据,截至2020', ' **的首富目前的个人财富来自房地产和互联网行业。根据202', ' 成为首富是一个非常具有挑战性和难以预测的因素,而且这个目标并不是每个人']

The max_tokens seems not working.

/kind bug