ggerganov/llama.cpp

Possible stopping issues and bad asterik tokenization? (GGUF related)

Dampfinchen opened this issue · 3 comments

Copy paste from my Huggingface discussion (https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/discussions/106)

With Llama 3 Instruct, I have noticed rare cases where the model sets the <|eot_id|> at inappropiate times, namely in the middle of a sentence. This is how it looks like:

"*She seems to relax even further, her tension easing as she sits next to you. She gazes out at the rain, her eyes lost in thought, as the sound of the droplets hitting the awning creates a soothing background noise. Every so"

In this roleplay case, it stops when it wants to say "Every so often". The frontend is set up correctly with the correct L3 template. I've noticed the model is particularily prone to stopping at a sentence like this and when you're using a relatively high rep pen. The model also forgets to set asteriks at the end of a sentence sometimes.

Now, I'm using a modern GGUF here with the latest tokenizer fixes. Sadly I cannot say if the FP16 model suffers from the same issues, e.g. if it is a GGUF issue or not. I do not have a capable PC to test the full FP16 model.

If anyone wants to test this, https://huggingface.co/Dampfinchen/Llama-3-8B-Ultra-Instruct-SaltSprinkle-Q8_0-GGUF I have prepared a GGUF of my latest merge which amplifies Meta Instruct and as such, makes this issue more reproduceable than it otherwise would be. When I set a higher rep pen and lead the model to write "Every so often", it will take a bit but it'd say its reproduceable. I did encounter this issue in the official LLama3 Instruct and at low rep pen as well, just much less so.

You might be wondering if I had the issue before the EOS token was set from 120001 (end_of_text) to 120009 (eot_id) and yes, the issue was noticeable before then. So that's not the cause of the problem.

Does anyone have insight into this?

Does seem like a GGUF problem. I'm not getting stop issues nor the model forgetting/setting asteriks incorrectly with exl2.

fuckup

@ggerganov In my testing, wrong asteriks formatting happens a lot with GGUF. When I try the same model with the same settings with exl2, there are no issues like that. Is there anything in the way ggml tokenizes asteriks and the stop token that could contribute to that? As well as stopping in the middle of a sentence?

Seems like exl2 got lucky yesterday, even though I made sure to use a lot of regenerations. But today it happened as well on Exl2. As well as the stop token issue.

Thus, it's not a gguf issue. I apologize for the inconvenience.