Error with WordLevel tokenizer
Opened this issue · 0 comments
markjr-cisco commented
Tried examples/example.py
with a tokenizer derived from a dict[str, int]
:
from tokenizers import Tokenizer
from tokenizers.models import WordLevel
tokenizer = WordLevel(Tokenizer(str_to_int_dict))
tokenizer.eos_token_id = '\n'
<remaining example.py code>
Stack trace:
Traceback (most recent call last):
File "<redacted>", line 177, in <module>
'',
File "/usr/local/lib/python3.10/dist-packages/parserllm/parserllm.py", line 43, in complete_cf
terminal_regexes = extract_terminal_regex(parser, tokenizer.decode(tokenizer.eos_token_id))
File "/usr/local/lib/python3.10/dist-packages/parserllm/parserllm.py", line 14, in extract_terminal_regex
regex_map['$END'] = regex.compile(stop_token)
File "<redacted>/.local/lib/python3.10/site-packages/regex/regex.py", line 353, in compile
return _compile(pattern, flags, ignore_unused, kwargs, cache_pattern)
File "/<redacted>/.local/lib/python3.10/site-packages/regex/regex.py", line 519, in _compile
raise TypeError("first argument must be a string or compiled pattern")
TypeError: first argument must be a string or compiled pattern