arcee-ai/mergekit

Moe merging failed

Opened this issue · 2 comments

I encountered an error while trying to merge two Qwen-based lora models using a mixture of experts (MoE) configuration with qwen architecture. I’m working with a phi2_moe2.yml configuration file, but the system throws an error related to a missing field (merge_method).

Configuration and Setup

I am using the following configuration yml:

base_model: CMLM/ZhongJing-2-1_8b
gate_mode: hidden # one of "hidden", "cheap_embed", or "random"
#dtype: float16 # output dtype (float32, float16, or bfloat16)
experts:
  - source_model: CMLM/ZhongJing-2-1_8b
    positive_prompts: []
  - source_model: Qwen2.5-1.5B-Instruct
    positive_prompts: []

When I run this setup, I get the following error:

[2024-11-04 18:51:10] [ERROR] Invalid yaml 1 validation error for MergeConfiguration
merge_method
  Field required [type=missing, input_value={'base_model': 'CMLM/ZhongJing-2-1_8b', 'gate_mode': 'hidden', 'experts': [{'source_model': 'CMLM/ZhongJing-2-1_8b', 'positive_prompts': []}, {'source_model': 'Qwen2.5-1.5B-Instruct', 'positive_prompts': []}]}]

Attempted Solutions
I suspect adding merge_method might resolve the issue, but I’m not sure what options are available for this field. I would appreciate guidance on:

Complete yml file for qwen moe merge_method
Documentation or examples: Are there any detailed examples or documentation that explain each field in the YAML configuration for MoE?
Additional Context
First model: CMLM/ZhongJing-2-1_8b
Second model: Qwen2.5-1.5B-Instruct

Thank you for your assistance!

It looks like you're using the mergekit-yaml command. For this type of config you want to use mergekit-moe.

In addition, this particular merge probably won't work - the two models you are looking at aren't the same size, so they will not be compatible.

I hope this message finds you well. Specifically, I have the following models:

Base Model: CMLM/ZhongJing-2-1_8b
Fine-Tuned Model: CMLM/ZhongJing-2-1_8b_finetuned based on Qwen-1.5-1.8B-Chat

Current Challenge: When attempting to merge these models using a YML configuration, I continue encounter the error.

Could you provide an example of a correctly structured YML file for merging these models? Despite following available guidelines, attempts to merge via your space result in errors.

Attempted Configuration: Here's the YML configuration I used:

yml

base_model: CMLM/ZhongJing-2-1_8b
gate_mode: hidden
dtype: bfloat16
experts:
  - source_model: CMLM/ZhongJing-2-1_8b
    positive_prompts:
      - "Human: 请从中医角度分析以下症状。\nAssistant: 好的,我会从中医理论出发,通过望闻问切的方法进行分析。"
      - "Human: 这些症状在中医理论中属于什么证型?\nAssistant: 让我根据中医辨证论治的原则来分析。"
      - "请解释一下中医的阴阳五行理论如何解释这个症状。"
      - "从中医角度来看,这些食材的性质和功效是什么?"
      - "这些中药的配伍原则是什么?"
    negative_prompts:
      - "What's the molecular mechanism of this drug?"
      - "Please explain the pathophysiology of this condition."
  - source_model: Qwen-1.5-1.8B-Chat
    positive_prompts:
      - "Based on modern medical research, what's the diagnosis?"
      - "What are the evidence-based treatment options for this condition?"
      - "Please explain the pathophysiological mechanism."
      - "What laboratory tests should be ordered?"
      - "According to clinical guidelines, what's the recommended treatment protocol?"
    negative_prompts:
      - "从阴阳五行的角度分析"
      - "请解释一下这个症状的中医证型"

Thank you very much for your time and assistance. I look forward to your guidance to resolve this merging issue effectively.