Lightning-AI/litgpt

Mixtral 8x22B support

SergioG-M opened this issue · 4 comments

Is there any plan to add support for Mixtral 8x22B?

We are supporting Mixtral 8x7B already, so adding Mixtral 8x22B might not be too difficult unless there are some unexpected other changes between these two architectures beside their different sizes. I'd be nice to add it some time.

Looking at the hub, I think the config we need for this is

    dict(
        name="Mixtral-8x22B-{}v0.1",
        hf_config=dict(org="mistralai", name="Mixtral-8x22B-{}v0.1"),
        padded_vocab_size=32000,
        block_size=65536,
        n_layer=56,
        n_query_groups=8,
        rotary_percentage=1.0,
        parallel_residual=False,
        bias=False,
        norm_class_name="RMSNorm",
        norm_eps=1e-05,
        mlp_class_name="LLaMAMoE",
        intermediate_size=16384,
        n_head=48,  # double-check
        rope_base=1000000,
        n_expert=8,
        n_expert_per_token=2,
    ),
]

but I haven't double-checked yet. It's a lot of weights (>600GB) to download.

Any update on this?

Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested in adding this model.