Is it possible to exclude the `<|eot_id|>` token?

Question

Is it possible to exclude the `<|eot_id|>` token?

Closed this issue 7 months ago · 6 comments

I know there are already some options for llama.cpp to exclude some tokens, but currently facing the <|eot_id|> with Llama3 Instruct in Interactive mode.

My options look like this:

But I'm getting the <|eot_id|> token. (should_output_eos has no effect)

I could do some string filtering, but before I set up a complicated system of string caching and live-detection, I thought I'd ask.

Answer 1 · 2024-05-14T00:12:01.000Z

I think Should Output Eos is actually the third Should Output... option, you may still have it on 😆

Answer 2 · 2024-05-14T09:14:56.000Z

Jep, that's Should Output Eos, but I've tried it both with on and off, but the token is shown either way.
Should I prepare a sample project or is there another way?

Answer 3 · 2024-05-14T16:56:06.000Z

I have tested on my side and it definitely works here. Could you download the new Godot LLM Template, either from the asset library or from here, then open the application -> Text Generation -> Change None to Person -> Generate, then see if <|eot_id|> is there or not

Answer 4 · 2024-05-14T16:59:59.000Z

It can also well be this issue: ggerganov/llama.cpp#6772
so the problem can come from the model side, but I think it has already been fixed
If the issue persists, could you try one of the "fixed" model here

Answer 5 · 2024-05-14T18:15:25.000Z

You're right, it was a model issue. The model you linked fixed the issue.
What's weird is, that I used the linked Meta-Llama-3-8B-Instruct-Q5_K_M.gguf from the ReadMe.
So on the one hand, I would think that it's best to change it, but on the other hand you didn't have the issue with the same model.

Answer 6 · 2024-05-21T01:12:35.000Z

It can well be we downloaded different versions of the model from the same repo, I have added a FAQ section in the README to point out that model versions can be a problem even if the model is the same