Tele-AI/TeleChat2

telechat7b的vllm推理,max_model_len和max_tokens值问题

Closed this issue · 3 comments

from vllm import LLM, SamplingParams
import torch
model_path = "custom_model/TeleChat2-7B"
llm = LLM(model=model_path, trust_remote_code=True, tensor_parallel_size=1, max_model_len=19168)
prompts = ['你好']
sampling_params = SamplingParams(max_tokens=2048, temperature=0.0, repetition_penalty=1.03) #推荐repetition_penalty为1.03
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    generated_text = output.outputs[0].text
    print(generated_text)

输出:

{"{"{"{"{"{"
 1. 문자으로 만들면 문자가 반반반반이 나서 문자가 반......반반

想问一下,7B的max_model_len和max_tokens应该设置为多少

您好 这个
prompts = ['你好']
后需要加上模板

prompt = "你好"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)
outputs = llm.generate([text], sampling_params)

这个问题我们会在后续迭代中修复readme内容

您好 这个 prompts = ['你好'] 后需要加上模板

prompt = "你好"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)
outputs = llm.generate([text], sampling_params)

这个问题我们会在后续迭代中修复readme内容

tokenizer的包和sampling_params应该怎么导入呢?

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("TeleAI/TeleChat2-7B/",trust_remote_code=True)