[BUG] Size Mismatch When Merging LoRA Model To Base Model
Closed this issue · 10 comments
Prerequisites
- I have read the documentation.
- I have checked other issues for similar problems.
Backend
Hugging Face Space/Endpoints
Interface Used
UI
Error Logs
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight:
copying a param with shape torch.Size([151665, 3584]) from checkpoint,
the shape in current model is torch.Size([152064, 3584]).
When i'm trying to merge my fine-tuned LoRA adapter (https://huggingface.co/neighborwang/ModeliCo-7B) into the base model Qwen2.5-Coder-7B-Instruct.
I get the error of size mismatch
like many useres are facing here with base models like Qwen or Llama: #487.
There is still no solution and I don't get why this issue has been closed.
I faced the same issue with Llama 3.1 but i solved it use specific transformers
version, so I tried for my adapter and Qwen2.5-Coder-7B-Instruct the following transformers
versions:
v4.45.1
v4.45.0
v4.44.0
v4.43.0
v4.37.0
But nothing works... I need some help.
I don't know why the fine-tuned adapter has different size as the base model, I suppose they should be the same automatically through the process with AutoTrain.
Is this a bug or is this something which I did wrong?
Thanks a lot in advance.
does this error occur when merge_adapter
is set to true in AutoTrain or are you getting this error when merging manually after the training?
Thanks for your quick reply!
This happens after the fine-tuning process, while I'm trying to manually merge them.
Regarding the merge process, I found the merge adapter here, somehow it is outdated and buggy. I modified it (code below) and successfully merged my another adapter and LLama base model.
But when i use it merge Qwen2.5 model, always the same error, as shown in my description.
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
def merge(base_model, trained_adapter, token):
# Load base model
base = AutoModelForCausalLM.from_pretrained(
base_model, torch_dtype=torch.float16, low_cpu_mem_usage=True, token=token
)
# Load adapter
model = PeftModel.from_pretrained(base, trained_adapter, token=token)
try:
tokenizer = AutoTokenizer.from_pretrained(base_model, token=token)
except RecursionError:
tokenizer = AutoTokenizer.from_pretrained(
base_model, unk_token="<unk>", token=token
)
# Merge and unload the adapter
model = model.merge_and_unload()
print("Saving target model")
model.push_to_hub(trained_adapter, token=token)
tokenizer.push_to_hub(trained_adapter, token=token)
return gr.Markdown("Model successfully merged and pushed! Please shutdown/pause this space")
with gr.Blocks() as demo:
gr.Markdown("## AutoTrain Merge Adapter")
gr.Markdown("Please duplicate this space and attach a GPU in order to use it.")
token = gr.Textbox(
label="Hugging Face Write Token", value="", lines=1, max_lines=1, interactive=True, type="password"
)
base_model = gr.Textbox(
label="Base Model (e.g. meta-llama/Llama-2-7b-chat-hf)", value="", lines=1, max_lines=1, interactive=True
)
trained_adapter = gr.Textbox(
label="Trained Adapter Model (e.g. username/autotrain-my-llama)", value="", lines=1, max_lines=1, interactive=True
)
submit = gr.Button(value="Merge & Push")
op = gr.Markdown()
submit.click(merge, inputs=[base_model, trained_adapter, token], outputs=[op])
if __name__ == "__main__":
demo.launch()
does this function also give you error?
you can use autotrain tools
to merge which uses the code above:
could you try and let me know if even this gives error?
Thank you very much!
I'm using this space https://huggingface.co/spaces/autotrain-projects/autotrain-advanced with no code training, since I don't have GPU on the local machine, also CPU is very bad.
Is there any possibity to use this merging tool within this space?
i took a look at it and it seems to me that adapter is merging fine. instead of merging later, could you please use the merge_adapter
parameter and set it to true
before training?
Hi Abhishek, thank you very much. Ok, in this case I would set it to true
before training.
I close this issue for now, if I have the same issue still, i will reopen it!
Thanks!
could you please confirm if it worked for you now? :)
Hi Abhishek, I haven't tested whether it works for Qwen2.5-Coder-7B. I have just trained a model based on StarCoder2-15B and used 'merge_adapter=true'. It worked without any problem. I think I will fine-tune the Qwen2.5-Coder-7B again this or next week using AutoTrain. I will update and comfirm if it works.
@abhishekkrthakur Just fine-tuned a Qwen2.5-Coder-7B
using AutoTrain
with merge_adapter=true
. It worked without any problems. Thanks!
great. thank you for confirming.