chujiezheng/chat_templates

[Potential Bug] Only BOS token when the user only provides system message for LLAMA-3

yangalan123 opened this issue · 15 comments

Hi,

Thanks for maintaining this great repo. I recently find a potential bug: if the user only input the system message, applying LLAMA-3-Instruct template will only return <|begin_of_text|> and completely ignore the provided system message. I guess the problem is in {% set loop_messages = messages[1:] %} when len(messages)==1.

Here is a minimal example to reproduce:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
chat_template_path = "chat_templates/llama-3-instruct.jinja"
chat_template = open(chat_template_path).read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
tokenizer.chat_template = chat_template
prompts = ["Please help me classify:", "Hello!"]
processed_prompts = [tokenizer.apply_chat_template([{"role": "system", "content": p},  ], tokenize=False, add_generation_prompt=True) for p in prompts]
print(processed_prompts)

The result is:

['<|begin_of_text|>', '<|begin_of_text|>']

It is more of an analysis experiment -- we may want to analyze the controlling power of system prompts, and the necessity of the existence of system prompts. We can also try to investigate whether we can incorporate user inquiry as part of the system prompt. This may be especially helpful in the setting of agent initialization.

But even if we do not consider the specific applications, I still feel this is problematic -- why do we only get "bos_token" when we only input the system message? When setting up "GPTs" at the GPT store, basically we only give a system prompt.

I see. I think in most cases, the user input is still necessary, and the system message can be optional. I will add checks to ensure the chat_messages contain at least one user input.

I think at least, when there is only a system message, the input should be ended with [EOS] as well. While I agree my situation perhaps is not "in most cases", I do want to note that 1) there does exist a recent trend that people start to evaluate at system prompts, like: [1], [2]. 2) Whenever an unexpected input occurs, at least some default choice should be implemented. I do not understand the logic that the setup of the system prompt depends on the user input. (Ideally, they should be independent.) It is weird and I do not believe many LLMs intend that.

Just a side question -- do you think you should collaborate with Huggingface to incorporate your template here to be part of their tokenizer natively? I feel this is somewhat not convenient in that every time I need to read the template here and set the tokenizer somehow.

For 1), I think the reference [1] does not adopt official chat templates, and the reference [2] always includes a user question.

For 2), I think it is not "the system prompt depending on the user input", but "prepending the system prompt before the user input". In other words, the system prompt becomes non-sensical if there is no user input.

In fact, I generally follow the logic of looping the non-system messages in llama-2-chat.

For the side question, I will give it a look and see whether it is possible to collaborate with the HF team.

I think there is a misunderstanding -- I cite those papers to prove I am not the only one looking at system prompts. They may conduct experiments in one way or another, but as people are getting interested in system prompt, I think my use case is actually not that rare.

I am confused about the comment that "the system prompt becomes non-sensical if there is no user input" -- why is that? Suppose I write "You are a helpful assistant" as a system prompt (a very commonly used one), it is valid and bears "alignment"/"principled" constraints for follow-up generations even without user inputs. It already changes the model output distribution somehow. Such a prompt is agnostic to what the user would write.

I just do not understand, why it is hard to implement some logic like (perhaps because I am new to Jinja),

if len(messages)==1 then....
else [default]

I am implementing this logic. I am just not sure in which cases the messages can look like

[
    {"role": "system", "content": "..."}
]

but not

[
    {"role": "user", "content": "..."}
]

or

[
    {"role": "system", "content": "..."},
    {"role": "user", "content": "..."}
]

Great, now it seems we are on the same page. Thanks for the effort of implementing this logic. Let's discuss more on the implementation:

My case is actually:

[
    {"role": "system", "content": "..."}
]

But I think the case for:

[
    {"role": "user", "content": "..."}
]

is also not that rare -- not many people, especially non-NLP people knows the concept of "system prompt". So I would say both situations can happen.

Yes. Currently, the templates have supported

[
    {"role": "user", "content": "..."}
]

Again, my question is still when

[
    {"role": "system", "content": "..."}
]

will happen? If there is no user input, I think we may not need the "chat" template (as there is no chat).

In other words, I think the user input is necessary when you apply the chat template, no matter whether you prepend the system message. This is the premise of the current implementation.

I have talked about the potential application of system-only messages (I intentionally distinguish the usage of "messages" and "prompts" here. The "prompt" is the formatted "message".) in research. So let's talk more in the chat setting:

Suppose we want to build a streaming chatbot, and we have a long list of regulations (like in Constitutional AI, perhaps should not be trained to be part of the model for safety/legal/privacy concerns) to ask the model to conform with. Following one current popular deployment, we should tokenize this system-only message early and compute KV-cache correspondingly to save deployment costs. In that case, we cannot assume any user inquiry and we should get a properly formatted prompt even with system-only messages. We input this prompt to the model and then we can compute the KV cache.

I understand the case of KV-cache. That is an excellent point. Thank you for sharing!

I just pushed the new implementation for simplification and unification. You may give it a try. But for the KV-cache case, one-by-one implementation is still needed for each model.

And once again, I do not recommend to pass

[
    {"role": "system", "content": "..."}
]

to the model. There may be unexpected outputs if no user inputs are provided.

Great work! Thanks for your contribution! I think it would be better if you could write a gentle caution to the user that some unexpected outputs can happen in some cases :-)

Would be happy to create one PR (ninja is not within my expertise so I hesitate to do that, sorry) or you can just write it. I actually think our discussion here has a lot of values for both research and application purposes.

Thanks for your advice. I will add this comment :)

Thanks!