illidanlab/personaGPT

Can't install the model with HuggingFace

AlexisPister opened this issue · 2 comments

Hi,

When I am trying to import the model with HuggingFace like this :

tokenizer = AutoTokenizer.from_pretrained("af1tang/personaGPT")
model = AutoModelForCausalLM.from_pretrained("af1tang/personaGPT")

I get the following error :

Traceback (most recent call last):
  File "/home/alexis/Documents/Projets/StudioArtScience/main.py", line 7, in <module>
    tokenizer = AutoTokenizer.from_pretrained("af1tang/personaGPT")
  File "/home/alexis/anaconda3/envs/StudioArtScience/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 531, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/alexis/anaconda3/envs/StudioArtScience/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1744, in from_pretrained
    return cls._from_pretrained(
  File "/home/alexis/anaconda3/envs/StudioArtScience/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1879, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/alexis/anaconda3/envs/StudioArtScience/lib/python3.10/site-packages/transformers/models/gpt2/tokenization_gpt2_fast.py", line 137, in __init__
    super().__init__(
  File "/home/alexis/anaconda3/envs/StudioArtScience/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 108, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: No such file or directory (os error 2)

Is there a way around this ?

mrzjy commented

It seems that the tokenizer path (field "tokenizer_file" and "name_or_path") in tokenizer_config.json is inappropriate.

Manually changing the path in tokenizer_config.json to your correct local path (whether it's the huggingface cache dir or any specified dir where you downloaded your model by git lfs clone https://huggingface.co/af1tang/personaGPT) should solve the problem

You can also bypass this issue by setting use_fast=False in the AutoTokenizer.from_pretrained initialization.

tokenizer = AutoTokenizer.from_pretrained("af1tang/personaGPT", use_fast=False)