Mixtral 8x22B support
SergioG-M opened this issue · 4 comments
SergioG-M commented
Is there any plan to add support for Mixtral 8x22B?
rasbt commented
We are supporting Mixtral 8x7B already, so adding Mixtral 8x22B might not be too difficult unless there are some unexpected other changes between these two architectures beside their different sizes. I'd be nice to add it some time.
rasbt commented
Looking at the hub, I think the config we need for this is
dict(
name="Mixtral-8x22B-{}v0.1",
hf_config=dict(org="mistralai", name="Mixtral-8x22B-{}v0.1"),
padded_vocab_size=32000,
block_size=65536,
n_layer=56,
n_query_groups=8,
rotary_percentage=1.0,
parallel_residual=False,
bias=False,
norm_class_name="RMSNorm",
norm_eps=1e-05,
mlp_class_name="LLaMAMoE",
intermediate_size=16384,
n_head=48, # double-check
rope_base=1000000,
n_expert=8,
n_expert_per_token=2,
),
]
but I haven't double-checked yet. It's a lot of weights (>600GB) to download.
SergioG-M commented
Any update on this?
rasbt commented
Right now, given that there are so many other things to do, I haven't had that on my priority list. But we'd be happy about contributions if you are interested in adding this model.