AutoTokenizer broken for v2 models
shwang opened this issue · 2 comments
shwang commented
T5ForConditionalGeneration.from_pretrained(model_name)
is OK, but AutoTokenizer
fails with error:
~/apps/miniconda3/envs/safe-gpt/lib/python3.8/site-packages/transformers/models/t5/tokenization_t5_fast.py in __init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, **kwargs)
126 )
127
--> 128 super().__init__(
129 vocab_file,
130 tokenizer_file=tokenizer_file,
~/apps/miniconda3/envs/safe-gpt/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
106 elif fast_tokenizer_file is not None and not from_slow:
107 # We have a serialization from tokenizers which let us directly build the backend
--> 108 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
109 elif slow_tokenizer is not None:
110 # We need to convert a slow tokenizer to build the backend
Exception: No such file or directory (os error 2)
To reproduce this error, run the snippet:
from transformers import AutoTokenizer, T5ForConditionalGeneration
model_name = "allenai/unifiedqa-v2-t5-small-1251000"
model = T5ForConditionalGeneration.from_pretrained(model_name) # OK
tokenizer = AutoTokenizer.from_pretrained(model_name) # Fails
yizhouzhao commented
save issue
danyaljj commented
Thanks for bringing this to our attention.
I am actually not sure why this is failing, even though it seems to be working just fine with T5Tokenizer
.
from transformers import T5Tokenizer, T5ForConditionalGeneration
model_name = "allenai/unifiedqa-v2-t5-small-1251000"
model = T5ForConditionalGeneration.from_pretrained(model_name) # OK
tokenizer = T5Tokenizer.from_pretrained(model_name) # Works okay too ¯\_(ツ)_/¯
For now, I'd suggest using T5Tokenizer
instead.
Re: AutoTokenizer
not sure if there is something we can do or whether HF folks should do.