telechat7b的vllm推理,max_model_len和max_tokens值问题
Closed this issue · 3 comments
BarryAlllen commented
from vllm import LLM, SamplingParams
import torch
model_path = "custom_model/TeleChat2-7B"
llm = LLM(model=model_path, trust_remote_code=True, tensor_parallel_size=1, max_model_len=19168)
prompts = ['你好']
sampling_params = SamplingParams(max_tokens=2048, temperature=0.0, repetition_penalty=1.03) #推荐repetition_penalty为1.03
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
generated_text = output.outputs[0].text
print(generated_text)
输出:
{"{"{"{"{"{"
1. 문자으로 만들면 문자가 반반반반이 나서 문자가 반......반반
想问一下,7B的max_model_len和max_tokens应该设置为多少
shunxing12345 commented
您好 这个
prompts = ['你好']
后需要加上模板
prompt = "你好"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True)
outputs = llm.generate([text], sampling_params)
这个问题我们会在后续迭代中修复readme内容
BarryAlllen commented
您好 这个 prompts = ['你好'] 后需要加上模板
prompt = "你好" messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template(messages,tokenize=False,add_generation_prompt=True) outputs = llm.generate([text], sampling_params)
这个问题我们会在后续迭代中修复readme内容
tokenizer的包和sampling_params应该怎么导入呢?
shunxing12345 commented
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("TeleAI/TeleChat2-7B/",trust_remote_code=True)