Converting checkpoints to HF post surgery Algos
Closed this issue · 1 comments
Trying to pretrain the Replit1_5v model https://huggingface.co/replit/replit-code-v1_5-3b which uses prefix LM.
I am able to load and train the model but am having trouble converting the Trainer checkpoints to HF. I'm getting this error when I try to convert.
##############################
HF checkpoint folder successfully created at 2B-firstpass/HF/.
Loading model from 2B-firstpass/HF/
construction
<class 'llmfoundry.models.layers.norm.LPLayerNorm'>
construction
<class 'llmfoundry.models.layers.attention.GroupedQueryAttention'>
Traceback (most recent call last):
File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 347, in <module>
convert_composer_to_hf(parse_args())
File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 338, in convert_composer_to_hf
raise e
File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 336, in convert_composer_to_hf
_convert_composer_to_hf(args)
File "/bit-replit/scripts/inference/convert_composer_to_hf.py", line 212, in _convert_composer_to_hf
loaded_hf_model = MPTForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/transformers/modeling_utils.py", line 3798, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 1040, in __init__
self.transformer: MPTModel = self.backbone_model_class(config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 414, in __init__
self.blocks = self.construct_blocks(config=config,)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 506, in construct_blocks
return nn.ModuleList([
^
File "/bit-replit/llmfoundry/models/mpt/modeling_mpt.py", line 507, in <listcomp>
self.block_class(
File "/bit-replit/llmfoundry/models/layers/blocks.py", line 107, in __init__
self.attn = build_attention_layer(
^^^^^^^^^^^^^^^^^^^^^^
File "/bit-replit/llmfoundry/models/layers/layer_builders.py", line 95, in build_attention_layer
return construct_from_registry(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/bit-replit/llmfoundry/utils/registry_utils.py", line 162, in construct_from_registry
constructed_item = registered_constructor(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: GroupedQueryAttention.__init__() got an unexpected keyword argument 'prefix_lm'
When looking at the convert_composer_to_hf.py file, it warns
note:: This function will not work properly if you used surgery algorithms when you trained your model. In that case you will want to
load the model weights using the Composer :class:~composer.Trainer
with the load_path
argument.
Could you provide an example with this?
We removed support for prefixlm in llmfoundry, so you'll need to back to a commit that still supports it. I believe v0.6.0 should work