ngxson/wllama

qwen returns empty string

flatsiedatsie opened this issue · 4 comments

I noticed something interesting where this tiny model returns an empty string whenever I query it:

https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GGUF/resolve/main/qwen1_5-0_5b-chat-q4_0.gguf

Hey, @flatsiedatsie, could it be related to the prompt you're using?

I tried it and got a good response.

Here's the prompt I used:

<|im_start|>user
Explain quantum computing like I'm five<|im_end|>
<|im_start|>assistant

Could you try again with this one?


Console info, for reference:
image

@felladrin Thank you for having a look. I didn't have time to look into details, but seems like Qwen models are quite sensitive to chat templates (due to their small size - there is no room for errors)

Please let me know if that works for you @flatsiedatsie

Thanks for testing on your end.

I managed to get output once, but only once.

I'm using the tokenizer from Transformers.js to generate the prompts. There was an issue with that, but it was fixed a while ago as far as I can tell. This process uses Jinja2 templates which are stored on HuggingFace.

tokenizer = await AutoTokenizer.from_pretrained(config_url);
return tokenizer.apply_chat_template(messages, {tokenize:false, return_tensor:false, add_generation_prompt:true});

You hint about the sensitivity to the prompt is very useful though. I'm doing some tests now.

It's working now. I'm not even sure why :-D

Screenshot 2024-05-10 at 23 16 20