[Question] Text generation by transformers pipeline is not working properly
HCTsai opened this issue · 2 comments
HCTsai commented
Text generation by transformers pipeline is not working properly
Sample code
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import GenerationConfig
from transformers import pipeline
import torch
model_name = "yentinglin/Taiwan-LLaMa-v1.0"
gpu_id = "cuda:0"
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16).cuda(gpu_id)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, use_fast=False)
gen_config = GenerationConfig(
max_length=1024,
do_sample=False)
query = """
USER: 台灣有多少人口
ASSISTANT:
"""
# 1. generate text by model.generate (working properly)
input_ids = tokenizer.encode(query, return_tensors='pt').to(gpu_id)
ans = tokenizer.batch_decode(model.generate(input_ids,
generation_config=gen_config))[0]
print(f"Question:{query}")
print("----------------------")
print(f"model.generate answer:\n{ans}")
# 2. generate text by transformers.pipeline(not working properly)
gen_pipe = pipeline(
task="text-generation",
model=model, tokenizer=tokenizer, device=gpu_id,
return_full_text=False,
generation_config=gen_config
)
pipe_res = gen_pipe(query)
print("----------------------")
print(f"pipeline generated_text:\n")
print(pipe_res[0]["generated_text"])
pipeline generated_text:
���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������
Any suggestions for solutions?
adamlin120 commented
Please use TGI or vllm for inference
adamlin120 commented
# pip install transformers>=4.34
# pip install accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="yentinglin/Taiwan-LLM-13B-v2.0-chat", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
{
"role": "system",
"content": "你是一個人工智慧助理",
},
{"role": "user", "content": "東北季風如何影響台灣氣候?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])