Tokenizer skips the special tokens while decoding
anandsarth opened this issue · 2 comments
anandsarth commented
Now the mistral-7B-v3 has the tool support there are special tokens like [TOOL_CALL] [TOOL_RESULT]. But when I decode the output from the results the special tokens are not present and there is no argument while decode like in huggingface for skip_speical_tokens=False
Therefore I am not about to know if the output is tool call or standard response. How can I decode the response from the output tokens
output_tokens = [5,1501, 7567,1629,2032,1113] # here token 5 is the special token about the tool call
#after decoding I get
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens)
# result [{"name": "get_current_weather", "arguments": {"location": "San Francisco, CA", "format": "celsius"}}]
#therefore I can't tell it is tool call
carlesonielfa commented
Experiencing the same issue here, did you manage to solve it?
anandsarth commented
I do it via huggingface tokenizer