Mixtral 8x22B support

Question

Mixtral 8x22B support

SergioG-M opened this issue 4 months ago · 4 comments

Is there any plan to add support for Mixtral 8x22B?

Answer 1 · 2024-05-27T11:50:45.000Z

We are supporting Mixtral 8x7B already, so adding Mixtral 8x22B might not be too difficult unless there are some unexpected other changes between these two architectures beside their different sizes. I'd be nice to add it some time.

Answer 2 · 2024-05-28T19:48:44.000Z

Looking at the hub, I think the config we need for this is

    dict(
        name="Mixtral-8x22B-{}v0.1",
        hf_config=dict(org="mistralai", name="Mixtral-8x22B-{}v0.1"),
        padded_vocab_size=32000,
        block_size=65536,
        n_layer=56,
        n_query_groups=8,
        rotary_percentage=1.0,
        parallel_residual=False,
        bias=False,
        norm_class_name="RMSNorm",
        norm_eps=1e-05,
        mlp_class_name="LLaMAMoE",
        intermediate_size=16384,
        n_head=48,  # double-check
        rope_base=1000000,
        n_expert=8,
        n_expert_per_token=2,
    ),
]

but I haven't double-checked yet. It's a lot of weights (>600GB) to download.

Answer 3 · 2024-06-11T13:59:06.000Z

Any update on this?

Answer 4 · 2024-06-11T15:34:53.000Z

Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested in adding this model.