FMInference/FlexLLMGen

Support for MoE models (see Switch Tranformer, NLLB)

Opened this issue · 0 comments

fiqas commented

Hi, have you guys considered adding a support for Mixture-of-Experts models?
They're usually quite hefty in terms of size and would be a great opportunity to have them offload parameters to CPU.

Examples:
Switch Transformers (https://huggingface.co/google/switch-base-256)
NLLB (https://github.com/facebookresearch/fairseq/tree/nllb/)