Bug with `CodeQwen1.5`: `data did not match any variant of untagged enum PyPreTokenizerTypeWrapper`
QwertyJack opened this issue · 1 comments
QwertyJack commented
When I load CodeQwen1.5-7B-Chat it complains:
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 23, in __init__
self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/vllm/transformers_utils/tokenizer.py", line 92, in get_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 862, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/conda/envs/vllm-0.4.2/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 12564 column 3
However, it works fine if downgraded to a previous version, e.g. tokenizers==0.15.2
.
ArthurZucker commented
Yep I think there was a problem in the conversion, probably used the main at some point