nlp-uoregon/trankit

OSError: Can't load config for 'xlm-roberta-base'.

Closed this issue · 2 comments

Hello everyone,

I get an error since a few days when running a Pipeline.

I use a fresh install of python 3.8 with trankit 1.1.1 .

Here is the code to reproduce :

# test_trankit.py
from trankit import Pipeline

p = Pipeline(lang='english')

and here is the error I get :

Downloading: 100%|████████████████████████████████████████████████████████████████| 5.07M/5.07M [00:06<00:00, 733kB/s]
http://nlp.uoregon.edu/download/trankit/v1.0.0/xlm-roberta-base/english.zip
Downloading: 100%|██████████████████████████████████████████████████████████████| 47.9M/47.9M [00:03<00:00, 12.2MiB/s]
Loading pretrained XLM-Roberta, this may take a while...
Traceback (most recent call last):
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 234, in get_config_dict
    resolved_config_file = cached_path(
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/file_utils.py", line 267, in cached_path
    raise EnvironmentError("file {} not found".format(url_or_filename))
OSError: file xlm-roberta-base/config.json not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_trankit.py", line 23, in <module>
    p = Pipeline(lang='english')
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/pipeline.py", line 82, in __init__
    self._embedding_layers = Multilingual_Embedding(self._config)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/models/base_models.py", line 55, in __init__
    super(Multilingual_Embedding, self).__init__(config, task_name=model_name)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/models/base_models.py", line 13, in __init__
    self.xlmr = XLMRobertaModel.from_pretrained(config.embedding_name,
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/modeling_utils.py", line 578, in from_pretrained
    config, model_kwargs = cls.config_class.from_pretrained(
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 202, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/kirian/miniconda3/envs/venv38/lib/python3.8/site-packages/trankit/adapter_transformers/configuration_utils.py", line 253, in get_config_dict
    raise EnvironmentError(msg)
OSError: Can't load config for 'xlm-roberta-base'. Make sure that:

- 'xlm-roberta-base' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'xlm-roberta-base' is the correct path to a directory containing a config.json file

I've tried logging stuff in the trankit code (in the cached_path method), but I didn't succeed to debut it.
I am suspecting a change in the huggingface pretrained model config (the config.json file being named differently), but I don't know enough context/history to go further in the debugging.

Thanks for your help !

Have you found a solution to this problem? Because I'm facing the same problem!

Hi @kirianguiller @peshmerge ,
Thanks for letting us know.
This issue might be due to the confusion of Trankit about the folder containing the cached models.
It can be usually solved by deleting all cached model files and download the Trankit models again.
Please reopen this issue if you're still facing it.