loading models is painful and not HF compatible
jameshensman opened this issue · 0 comments
jameshensman commented
To load a sliced model, we first load an uninitialized model, slice it, and load the checkpoint in. This is a pain for a few reasons:
-
Adding new models means adding a switch in this code:
-
it's not easy for HF users to use our models directly, without running slicing themselves. It would be great if users could just do
AutoModelForCausalLM.from_pretrained('microsoft/sliced-llama2-13B-30pc')
. This would mean publishing such compatible models on HF, which would mean creating the model class explicitly.
Things to consider for a solution:
- we'd probably need to store the "new hidden size" in the config somehow
- we should make sure this doesn't block us from doing slicing with different levels per layer.
- adding new models should "just work", without the current if/else in hf_utils