Potential vulnerability: Control token injection through Jinja templates in apply_chat_template

Question

Potential vulnerability: Control token injection through Jinja templates in apply_chat_template

pluiez opened this issue 4 months ago · 2 comments

There is a potential security vulnerability in the apply_chat_template function within the Tokenizers library. The current implementation, which leverages Jinja templates and processes the conversation as text before tokenization, introduces a risk of control token injection attacks.

In real-world chat applications, malicious actors could exploit this vulnerability by injecting control tokens disguised within Jinja templates. This could potentially allow them to manipulate the AI assistant's behavior, such as redefining system prompts and disrupting conversation flow.

Current Approach and Concerns:

Jinja Template Processing: The apply_chat_template function first generates text by applying Jinja templates to the conversation history.
Tokenization on Generated Text: Subsequently, this generated text is tokenized, leaving it vulnerable to control token injection within the Jinja templates.

Suggested Approach:

Individual Message Tokenization: Propose an alternative approach where each individual message in the conversation history is tokenized independently.
Context-Aware Special Token Insertion: Special tokens, such as <|system_prompt|>, would then be inserted at appropriate positions based on the context after tokenization.

Answer 1 · 2024-03-11T10:53:31.000Z

In real-world chat applications, malicious actors could exploit this vulnerability by injecting control tokens disguised within Jinja templates. This could potentially allow them to manipulate the AI assistant's behavior, such as redefining system prompts and disrupting conversation flow.

Not a bug nor security. Tokens do NOT have privileges linked to them. Models will output what they output, and there is not hierarchy in the tokens for the model itself. Inserting tokens for users by means of text, is a feature. Limiting them to a subset of tokens might be desirable, but will never be any kind of substantial defense against prompt injection or jailbreak (just look at how easy to "jailbreak" ANY model.)

This repo also does not have jinja as a dependency, nor does it apply templates.