Completion seems to ignore EOS token
hvisser opened this issue · 3 comments
I'm using your library with phi-2 on an Android device (after updating the llama.cpp version). I've noticed that generation seems to ignore or skip end of stream tokens somehow. For example here's the output from llama.cpp itself:
prompt:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
hello<|im_end|>
<|im_start|>assistant
output:
<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today? [end of text]
When using jllama it looks like this:
<|im_start|>system
You are a helpful assistant
<|im_start|>user
hello
<|im_start|>assistant
Hello! How can I assist you today?<|im_end|>
<|im_start|>user
[more output]
Note that the text includes the token, and the start token for the next user prompt, which the model is self-generating ;)
Looking at the llama.cpp source, it seems to be stopping here https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L896 and there's a similar condition in jjlama here: https://github.com/kherud/java-llama.cpp/blob/master/src/main/cpp/jllama.cpp#L831 but since I'm not super familiar with these bindings and llama.cpp I haven't figured out what's different in that condition.
Debugging this some more it seems like the token generated when it emits the angled bracket "faulty" <|im_end> is not the end of stream token, but the token for <.
There's another weird difference; testing from the command line, the prompt is tokenized to 19 tokens, while when I run the same prompt on my Android device, it tokenizes to 51 tokens and it doesn't tokenize the special tokens either, so maybe that is the source of the issue?
Hey @hvisser the Java binding isn't based on the newest version of llama.cpp. I think back then some assumptions about the tokenizer were hard-coded and that's why it might be incompatible with phi-2. I'll have a look later today.
I think I've found the issue, https://github.com/ggerganov/llama.cpp/blob/master/examples/main/main.cpp#L257 sets the "special" flag to true when tokenizing the prompt. If I update the tokenize function to set that flag to true as well in java-llama, the prompt is tokenized correctly and the output is also correct. So I think that should be set to true? That flag was introduced 3 months ago in llama.cpp but I guess it depends on the model used whether it has an effect. I'll shoot you a PR if you want ;)