Pinned Repositories
awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
Dipoorlet
Offline Quantization Tools for Deploy.
LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
LightX2V
Light Video Generation Inference Framework
MQBench
Model Quantization Benchmark
Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
TFMQ-DM
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
United-Perception
United Perception
Wan2.2-Lightning
Wan2.2-Lightning: Speed up wan2.2 model with distillation
ModelTC's Repositories
ModelTC/LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
ModelTC/Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
ModelTC/MQBench
Model Quantization Benchmark
ModelTC/LightX2V
Light Video Generation Inference Framework
ModelTC/LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
ModelTC/Wan2.2-Lightning
Wan2.2-Lightning: Speed up wan2.2 model with distillation
ModelTC/TFMQ-DM
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
ModelTC/ComfyUI-Lightx2vWrapper
ComfyUI custom node for lightx2v
ModelTC/EasyLLM
Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing usability, it also ensures training efficiency.
ModelTC/HarmoniCa
[ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
ModelTC/OmniBal
[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".
ModelTC/quant_horizon
ModelTC/general-sam
A general suffix automaton implementation in Rust with Python bindings
ModelTC/LightTTS
Light-tts is a lightweight TTS inference framework optimized for CosyVoice2, enabling fast and scalable speech synthesis in Python.
ModelTC/general-sam-py
Python bindings for general-sam and some utilities
ModelTC/mtc-token-healing
Token healing implementation in Rust
ModelTC/LightKernel
ModelTC/SageAttention
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
ModelTC/lightllm-blog
ModelTC/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
ModelTC/verl
verl: Volcano Engine Reinforcement Learning for LLMs
ModelTC/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ModelTC/fa3
ModelTC/flash-attention
Fast and memory-efficient exact attention
ModelTC/flash-attn-3-build
ModelTC/greedy-tokenizer
Greedily tokenize strings with the longest tokens iteratively.
ModelTC/lightx2v_comfyui_node
ModelTC/LLM_QAT
ModelTC/modeltc.github.io
ModelTC/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)