ModelTC

Model Infra

Pinned Repositories

awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
57 8 05
Dipoorlet
Offline Quantization Tools for Deploy.
Language:Python141 14 1219
LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
Language:Python613 9 9862
LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python3.7k 31 225282
LightX2V
Light Video Generation Inference Framework
Language:Python771 9 4948
MQBench
Model Quantization Benchmark
Language:Python847 14 206142
Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Language:Python919 13 2536
TFMQ-DM
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Language:Jupyter Notebook106 8 124
United-Perception
United Perception
Language:Python436 20 6667
Wan2.2-Lightning
Wan2.2-Lightning: Speed up wan2.2 model with distillation
Language:Python214 5 1813

ModelTC's Repositories

ModelTC/LightLLM
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python3.7k 31 225282
ModelTC/Qwen-Image-Lightning
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Language:Python919 13 2536
ModelTC/MQBench
Model Quantization Benchmark
Language:Python847 14 206142
ModelTC/LightX2V
Light Video Generation Inference Framework
Language:Python771 9 4948
ModelTC/LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
Language:Python613 9 9862
ModelTC/Wan2.2-Lightning
Wan2.2-Lightning: Speed up wan2.2 model with distillation
Language:Python214 5 1813
ModelTC/TFMQ-DM
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
Language:Jupyter Notebook106 8 124
ModelTC/ComfyUI-Lightx2vWrapper
ComfyUI custom node for lightx2v
Language:Python495
ModelTC/EasyLLM
Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing usability, it also ensures training efficiency.
Language:Python48 6 18
ModelTC/HarmoniCa
[ICML 2025] This is the official PyTorch implementation of "🎵 HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration".
Language:Python42 5 41
ModelTC/OmniBal
[ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance".
Language:Python24 6 43
ModelTC/quant_horizon
Language:Cuda11 6 01
ModelTC/general-sam
A general suffix automaton implementation in Rust with Python bindings
Language:Rust8 6 10
ModelTC/LightTTS
Light-tts is a lightweight TTS inference framework optimized for CosyVoice2, enabling fast and scalable speech synthesis in Python.
Language:Python7 0 0
ModelTC/general-sam-py
Python bindings for general-sam and some utilities
Language:Python5 6 00
ModelTC/mtc-token-healing
Token healing implementation in Rust
Language:Rust4 6 00
ModelTC/LightKernel
Language:HTML2 0 0
ModelTC/SageAttention
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Language:Cuda2 0 0
ModelTC/lightllm-blog
Language:SCSS1 6 0
ModelTC/opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Language:Python1 0 0
ModelTC/verl
verl: Volcano Engine Reinforcement Learning for LLMs
Language:Python1 0 01
ModelTC/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python1 0
ModelTC/fa3
Language:Python0 0
ModelTC/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0
ModelTC/flash-attn-3-build
Language:Dockerfile5 02
ModelTC/greedy-tokenizer
Greedily tokenize strings with the longest tokens iteratively.
Language:Python6 0
ModelTC/lightx2v_comfyui_node
5 0
ModelTC/LLM_QAT
Language:Python
ModelTC/modeltc.github.io
Language:HTML5 0
ModelTC/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Language:Python0 0