keras-team/keras-nlp

Issue instantiating a keras_nlp.models.Backbone from a model preset of Hugging Face handles

Opened this issue · 4 comments

Describe the bug

I am unable to instantiate a keras_nlp.models.Backbone from a model preset of Hugging Face handles, and get the following error:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
config.json: 100%
 462/462 [00:00<00:00, 18.6kB/s]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-2-085dc66f3fb4>](https://localhost:8080/#) in <cell line: 1>()
----> 1 backbone = keras_nlp.models.Backbone.from_preset("hf://dmis-lab/biobert-v1.1")

1 frames
[/usr/local/lib/python3.10/dist-packages/keras_nlp/src/utils/preset_utils.py](https://localhost:8080/#) in check_config_class(preset, config_file)
    409     with open(config_path) as config_file:
    410         config = json.load(config_file)
--> 411     return keras.saving.get_registered_object(config["registered_name"])
    412 

KeyError: 'registered_name'

To Reproduce
https://colab.research.google.com/drive/1sL3dEM8ZOLCbQ5RM1P6usrrzfqk5aTbQ?usp=sharing

Expected behavior
This line should probably be changed?
https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/utils/preset_utils.py#L411

E.g., config.json, from HF: https://huggingface.co/google-bert/bert-base-uncased/resolve/main/config.json

Would you like to help us fix it?

I believe this is because you are attempting to load a huggingface/transformers model as a KerasNLP model. The config and weights are in a fundamentally different format. See #1294 for a bit more of a deep dive here.

I think there's two actionable things here:

  • Long term, we want to offer easy ways to convert from transformers models to keras_nlp models, and vice versa. TBD on how exactly that should work. A set of tools shipped with keras-nlp? Suggestion welcome.
  • More immediately, we might want to make sure we can detect this case and have good error message. Especially because both Transformers and KerasNLP use a config.json. It's confusing!

@Wauplin @SamanehSaadat cc'ing for thoughts.

More immediately, we might want to make sure we can detect this case and have good error message. Especially because both Transformers and KerasNLP use a config.json. It's confusing!

I see two places in the code where we could add a check. Either when we download from the Hub (check the metadata.json file for instance?) or when we load the model from a local preset. Both are valid IMO. If we add the check when downloading the files from HF, that would be HF-specific and therefore not applied if a user manually download files and try to load them (you'll have such usage for sure!). If we add the check when loading the files, that would not be specific to HF which is somehow better IMO. It would mean checking the consistency of the config.json compared to what's expected. WDYT?

Agree on the longer term goal as well but I'll defer the topic to @Rocketknight1 who's more knowledgeable on the transformers side (and most likely to be discussed in a separate issue?).

@mattdangerw I think a short-term check that raises a sensible error makes sense as the first step. Longer-term, conversion should be possible - once we detect a transformers checkpoint in KerasNLP, as long as KerasNLP already has support for that architecture, we could just have a mapping for config attributes, tokenizer vocab and layer names to convert the checkpoint.

In theory, simple. In practice, I'm sure there are lots of painful edge cases to worry about, but I suspect we'd get most of the benefit just from supporting a few of the most popular architectures (e.g. Gemma/Llama/Mistral/Mixtral), and maybe that wouldn't be so bad!

@Wauplin I agree that we should add a check when loading the model to make sure both local and remote presets are covered. I believe, the check should be done at the beginning of our from_preset(). I'll add the check.