yule-BUAA/MergeLM

Is there merged model available for download?

kexul opened this issue · 9 comments

kexul commented

Hi, thanks for the great work! Is there merged model available in huggingface?

Hello,

Thanks for your interest in our work!

Could you please tell me which merged models you want to download? I can upload them accordingly in huggingface.

kexul commented

Maybe the wizardlm series?

Personally I'd like to have a model with wizardlm and wizardcode merged. Maybe we could call the lengendary TheBloke to quantize it then.

Many thanks!

Hi,

I have tried to upload the checkpoints to HuggingFace but it failed many times due to the network connection issue. (XoX)

Could you please run the following command to obtain the checkpoint that you want?

python merge_llms_instruct_math_code.py --merge_instruct --merge_code --merging_method_name mask_merging --use_weight_rescale --weight_mask_rate 0.3 --mask_apply_method task_arithmetic --scaling_coefficient 1.0 --tensor_parallel_size 1

The above command only requires CPUs with about 470GB of memory space. Note that if you want to save the checkpoint, please comment out this line since our code automatically deletes the checkpoint after evaluation.

Moreover, since existing model merging methods assume the models to be merged are fine-tuned from the same architecture, the code model we merge is llama-2-13b-code-alpaca instead of WizardCoder-Python-13B as WizardCoder-Python-13B is fine-tuned based on Code Llama rather than Llama 2.

Please feel free to ask if there are any further questions.

kexul commented

470GB of memory space Sorry, that's not what I can afford as an end user! 😭

OK.
Now I am trying to upload the checkpoint to Baidu Wangpan. I will share the link after the uploading is completed.

Is this 470GB disk space or RAM?

470GB of memory space Sorry, that's not what I can afford as an end user! 😭

Hi, I have uploaded the checkpoints to Baidu Wangpan. Note that we respectively store the merged checkpoint for instruction-following and code-generating models due to their difference in the tokenizer configurations. But their parameters are exactly identical.

The merged checkpoint for the instruction-following task:
Link:https://pan.baidu.com/s/1thtOAGeHlCOZSFvcXgl6hQ
Extraction code:zykq

The merged checkpoint for the code-generating task:
Link:https://pan.baidu.com/s/1mkC3GobfqUbKXqTvY1QCzw
Extraction code:ccu0

I hope this will help address your issue. ^_^

Is this 470GB disk space or RAM?

It uses 470GB RAM.

The disk space would be the same as the pre-trained backbone takes.

Hi, guys.

Close this issue now.

Please feel free to reopen it when there are any further questions.