vLLM not working as expected with ChatGLM2
kerthcet opened this issue · 0 comments
kerthcet commented
When chatting with ChatGLM2 via vLLM, we only got few messages, e.g.
result = chat.completion(
... messages=[
... [
... ChatMessage(role="user", content="**共有多少人口?"),
... ],
... [
... ChatMessage(role="user", content="**首富是谁"),
... ],
... [
... ChatMessage(role="user", content="如何在三年内成为**首富"),
... ],
... ],
... temperature=0.7, # You can also overwrite the configurations in each conservation.
... max_tokens=2048,
... )
Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 17.31it/s]
>>> print(result)
[' 根据2021年**国家统计局发布的数据,截至2020', ' **的首富目前的个人财富来自房地产和互联网行业。根据202', ' 成为首富是一个非常具有挑战性和难以预测的因素,而且这个目标并不是每个人']
The max_tokens
seems not working.
/kind bug