MiuLab/Taiwan-LLM

[Question] Text generation by transformers pipeline is not working properly

HCTsai opened this issue · 2 comments

HCTsai commented

Text generation by transformers pipeline is not working properly

Sample code

from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import GenerationConfig
from transformers import pipeline
import torch

model_name = "yentinglin/Taiwan-LLaMa-v1.0"
gpu_id = "cuda:0"

model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16).cuda(gpu_id)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True, use_fast=False)
gen_config = GenerationConfig(
        max_length=1024,
        do_sample=False)

query = """
USER: 台灣有多少人口
ASSISTANT:
"""

# 1. generate text by model.generate (working properly)
input_ids = tokenizer.encode(query, return_tensors='pt').to(gpu_id)
ans = tokenizer.batch_decode(model.generate(input_ids, 
                                                generation_config=gen_config))[0]
print(f"Question:{query}")
print("----------------------")

print(f"model.generate answer:\n{ans}")

# 2. generate text by transformers.pipeline(not working properly)
gen_pipe = pipeline(
            task="text-generation",
            model=model, tokenizer=tokenizer, device=gpu_id,
            return_full_text=False,
            generation_config=gen_config
        )

pipe_res = gen_pipe(query)

print("----------------------")
print(f"pipeline generated_text:\n")
print(pipe_res[0]["generated_text"])

pipeline generated_text:

���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

Any suggestions for solutions?

Please use TGI or vllm for inference

# pip install transformers>=4.34
# pip install accelerate

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="yentinglin/Taiwan-LLM-13B-v2.0-chat", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {
        "role": "system",
        "content": "你是一個人工智慧助理",
    },
    {"role": "user", "content": "東北季風如何影響台灣氣候?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])