salesforce/CodeT5

Error when loading embedding pre-trained model

pdhung3012 opened this issue · 5 comments

Hello. I tried this simple code snippet for getting the embedding for a pre-trained model using CodeT5Plus:

`checkpoint="/home/hungphd/media/git/codet5p-110m-embedding"
device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True)
model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True).to(device)

inputs = tokenizer.encode("def print_hello_world():\tprint('Hello World!')", return_tensors="pt").to(device)
embedding = model(inputs)[0]
print(f'Dimension of the embedding: {embedding.size()[0]}, with norm={embedding.norm().item()}')
`

However, I got this error:
Traceback (most recent call last): File "/home/hungphd/media/git/CodeT5/CodeT5+/code_retrieval/examplePretrainedModel.py", line 8, in <module> model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True).to(device) File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 529, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]] File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 278, in __getitem__ raise KeyError(key) KeyError: 'codet5p_embedding'
I downloaded the pre-trained model from hugging faces. The folder of pre-trained model looks like this:

https://drive.google.com/file/d/1CdLv5GyNFeIPPufLcUS-5TT53W_4fes4/view?usp=drive_link

Did I do it correctly?

Hello. After checking the error, I investigate that the reason is due to the codet5p_embedding hasn't been a key defined in this file:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py

I check in the latest version on Github they didn't define that key. Is that key specific to your machine (or did you update the configuration_auto.py file specifically compared to the general version)?

Hello, are you able to run the example script here? I've double check it and can run it successfully.

I guess the reason for the above error you faced is due to that you load the model from a local folder instead of from the remote HuggingFace hub. To load the local model checkpoint, you'll have to download the class and config files. Below is the example code to load the local model:

from modeling_codet5p_embedding import CodeT5pEmbeddingModel
model = CodeT5pEmbeddingModel.from_pretrained(checkpoint)

Thank you for your help. Yes, at first I ran the script on my local ubuntu 22.04 server (which the code called to the remoted HuggingFace hub "Salesforce/codet5p-110m-embedding") but it returned the same error. Thus, I changed to download the model to my local machine to call but the error still remained.
Let me try again.

I tried to load it remotely from huggingface remote side. It shows the error like this:

`
OSError: Can't load 'Salesforce/codet5p-110m-embedding'. Make sure that:

  • 'Salesforce/codet5p-110m-embedding' is a correct model identifier listed on 'https://huggingface.co/models'

  • or 'Salesforce/codet5p-110m-embedding' is the correct path to a directory containing a 'config.json' file`

I checked the version of my transformer. I used the old version of transformers as 4.11.3, while the current version is 4.32.1. reinstall my transformer and it worked. Thanks for your help.