chujiezheng/chat_templates

zephyr seems to have some problems

Closed this issue · 2 comments

Thanks for your work! It is very useful in my research. I found that zephyr seems to have some problems.

Here's how I use for zephyr:

from transformers import AutoTokenizer

toker = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
messages = [
    {'role': 'system', 'content': 'This is a system prompt.'},
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]

chat_template = open('./chat_templates/zephyr.jinja').read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

The result is

<|system|>
This is a system prompt.</s><|user|>
This is the first user input.</s><|assistant|>
This is the first assistant response.</s><|user|>
This is the second user input.</s><|assistant|>

However, it should be

<|system|>
This is a system prompt.</s>
<|user|>
This is the first user input.</s>
<|assistant|>
This is the first assistant response.</s>
<|user|>
This is the second user input.</s>
<|assistant|>

Can you check? Thanks!

Maybe this is a solution:

chat_template = chat_template.replace('    ', '').replace('}\n', '}')

Just fixed it. Thank you.