moe

There are 200 repositories under moe topic.

  • vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Language:Python62.5k45211.8k11.1k
  • LLaMA-Factory

    hiyouga/LLaMA-Factory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Language:Python62.1k3027.6k7.5k
  • sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    Language:Python20.1k1163.7k3.3k
  • NVIDIA/TensorRT-LLM

    TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

    Language:C++12.1k1203.1k1.8k
  • modelscope/ms-swift

    Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).

    Language:Python10.9k463.8k952
  • Bangumi

    czy0729/Bangumi

    :electron: An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录,bgm.tv 第三方客户端。为移动端重新设计,内置大量加强的网页端难以实现的功能,且提供了相当的自定义选项。 目前已适配 iOS / Android。

    Language:TypeScript4.9k25275154
  • flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    Language:Cuda4k37512561
  • zai-org/GLM-4.5

    GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

    Language:Python3.2k3895323
  • PKU-YuanGroup/MoE-LLaVA

    【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models

    Language:Python2.3k2298140
  • MoonshotAI/MoBA

    MoBA: Mixture of Block Attention for Long-Context LLMs

    Language:Python2k2635119
  • davidmrau/mixture-of-experts

    PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538

    Language:Python1.2k726110
  • pjlab-sys4nlp/llama-moe

    ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

    Language:Python99582562
  • microsoft/Tutel

    Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4

    Language:C9352087105
  • sail-sg/Adan

    Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

    Language:Python80263669
  • ScienceOne-AI/DeepSeek-671B-SFT-Guide

    An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (DeepSeek-V3/R1 满血版 671B 全参数微调的开源解决方案,包含从训练到推理的完整代码和脚本,以及实践中积累一些经验和结论。)

    Language:Python77571694
  • open-compass/MixtralKit

    A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI

    Language:Python77271677
  • ymcui/Chinese-Mixtral

    中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)

    Language:Python610141144
  • mindspore-courses/step_into_llm

    MindSpore online courses: Step into LLM

    Language:Jupyter Notebook478841122
  • kokororin/pixiv.moe

    😘 A pinterest-style layout site, shows illusts on pixiv.net order by popularity.

    Language:TypeScript366114941
  • weigao266/Awesome-Efficient-Arch

    Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

  • SkyworkAI/MoH

    MoH: Multi-Head Attention as Mixture-of-Head Attention

    Language:Python2883515
  • LISTEN-moe/android-app

    Official LISTEN.moe Android app

    Language:Kotlin27365725
  • SkyworkAI/MoE-plus-plus

    [ICLR 2025] MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts

    Language:Python2502213
  • inferflow/inferflow

    Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

    Language:C++24981723
  • libgdx/gdx-pay

    A libGDX cross-platform API for InApp purchasing.

    Language:Java2343518888
  • inclusionAI/Ling

    Ling is a MoE LLM provided and open-sourced by InclusionAI.

    Language:Python2314420
  • IBM/ModuleFormer

    ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.

    Language:Python2247611
  • inclusionAI/Ling-V2

    Ling-V2 is a MoE LLM provided and open-sourced by InclusionAI.

    Language:Python2171317
  • shufangxun/LLaVA-MoD

    [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

    Language:Python20752615
  • cocowy1/SMoE-Stereo

    [ICCV 2025 Highlight] 🌟🌟🌟 Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

    Language:Python17910212
  • junchenzhi/Awesome-LLM-Ensemble

    A curated list of Awesome-LLM-Ensemble papers for the survey "Harnessing Multiple Large Language Models: A Survey on LLM Ensemble"

    Language:HTML1584015
  • LINs-lab/DynMoE

    [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

    Language:Python14310918
  • Facico/GOAT-PEFT

    [ICML2025] Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

    Language:Python13211212
  • kyegomez/SwitchTransformers

    Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"

    Language:Python1292415
  • shalldie/chuncai

    A lovely Page Wizard, is responsible for selling moe.

    Language:TypeScript1158334
  • kyegomez/MoE-Mamba

    Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Zeta

    Language:Python114447