Unexpected Prompt Results on GPU Execution for Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M
Opened this issue · 0 comments
smartdolphin commented
When running the Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M model on CPU, the prompts work as expected.
However, when running the same model on GPU, the prompts produce incorrect results.
Is this a known issue?
- source code
import os
from llama_cpp import Llama
from transformers import AutoTokenizer
model_id = 'Bllossom/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = Llama(
model_path='models/llama-3-Korean-Bllossom-70B-gguf-Q4_K_M.gguf',
n_ctx=512,
n_gpu_layers=-1 # Number of model layers to offload to GPU
)
PROMPT = \
'''당신은 유용한 AI 어시스턴트입니다. 사용자의 질의에 대해 친절하고 정확하게 답변해야 합니다.
You are a helpful AI assistant, you'll need to answer users' queries in a friendly and accurate manner.'''
instruction = '2x + 3 = 7이라면 x는?'
messages = [
{"role": "system", "content": f"{PROMPT}"},
{"role": "user", "content": f"{instruction}"}
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt=True
)
generation_kwargs = {
"max_tokens":512,
"stop":["<|eot_id|>"],
"echo":True, # Echo the prompt in the output
"top_p":0.9,
"temperature":0.6,
}
resonse_msg = model(prompt, **generation_kwargs)
print(resonse_msg['choices'][0]['text'])
- Prompt Results
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
당신은 유용한 AI 어시스턴트입니다. 사용자의 질의에 대해 친절하고 정확하게 답변해야 합니다.<|eot_id|><|start_header_id|>user<|end_header_id|>
2x + 3 = 7이라면 x는?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
어떤 것이 있습니다.
1)