[Prompt Template] Silent bug - Performance Killer
timothylimyl opened this issue · 0 comments
Hi,
I found that the prompt generated from the dataset (ex: MMLU) is not wrapped according to the model's prompt template. The performance you'll get out of the model will be degraded. If you look into the mmlu.py
script, you will see that your prompt that the model runs on is a prompt that has not gone through the template required by the model.
def evaluate(args, subject, model: EvalModel, dev_df, test_df):
.
.
.
prompt = prompt_input_template(prompt, model) # personally added
pred = model.run(prompt)
.
.
.
.
-> modeling.py
def run(self, prompt: str, **kwargs) -> str:
self.load()
inputs = self.tokenizer(prompt, return_tensors="pt").to(self.device)
if "RWForCausalLM" in str(type(self.model)):
inputs.pop("token_type_ids") # Not used by Falcon model
outputs = self.model.generate(
**inputs,
max_new_tokens=self.max_output_length,
pad_token_id=self.tokenizer.eos_token_id, # Avoid pad token warning
**kwargs,
)
batch_size, length = inputs.input_ids.shape
return self.tokenizer.decode(outputs[0, length:], skip_special_tokens=True)
I personally added a prompt_input_template()
function to solve this issue. You'll still get correct outputs from time to time which is why this is a silent bug (common ML issue).
It will be hard for you to solve this generically for every model as the problem is hugging face recently only realised this problem and added prompt template in the tokenizer (ref: https://huggingface.co/docs/transformers/chat_templating). Thus, once the open-source community adapts this (which I think it will be the eventual case), you can use apply_template
to solve this issue. For now, you can add individual mappings to templatized the prompts accordingly. For base pretrained models or API calls, this is not an issue.