Support for MoE models (see Switch Tranformer, NLLB)
Opened this issue · 0 comments
fiqas commented
Hi, have you guys considered adding a support for Mixture-of-Experts models?
They're usually quite hefty in terms of size and would be a great opportunity to have them offload parameters to CPU.
Examples:
Switch Transformers (https://huggingface.co/google/switch-base-256)
NLLB (https://github.com/facebookresearch/fairseq/tree/nllb/)