KeyError: '<ï½\x9cendâ\x96\x81ofâ\x96\x81sentenceï½\x9c>' Tokenizer crash

Question

KeyError: '<ï½\x9cendâ\x96\x81ofâ\x96\x81sentenceï½\x9c>' Tokenizer crash

SinanAkkoyun opened this issue a year ago · 0 comments

ERROR:waitress:Exception while serving /api/notepad_generate
Traceback (most recent call last):
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/waitress/channel.py", line 428, in service
    task.service()
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/waitress/task.py", line 168, in service
    self.execute()
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/waitress/task.py", line 456, in execute
    for chunk in app_iter:
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/werkzeug/wsgi.py", line 256, in __next__
    return self._next()
           ^^^^^^^^^^^^
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/werkzeug/wrappers/response.py", line 32, in _iter_encoded
    for item in iterable:
  File "/home/ai/.mconda3/envs/exl2/lib/python3.11/site-packages/flask/helpers.py", line 115, in generator
    yield from gen
  File "/home/ai/ml/llm/inference/exl2/ex-ui/backend/notepads.py", line 324, in generate
    exclusive_sc.append(tokenizer.extended_piece_to_id[text])
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: '<ï½\x9cendâ\x96\x81ofâ\x96\x81sentenceï½\x9c>'

Hi, this happens for DeepSeek models (coder and llm) when clicking Generate in the notebook. However, I haven't encountered this for clicking >> Token (yet)