MIT HAN Lab

Efficient AI Computing. PI: Song Han

MIT

Pinned Repositories

bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Language:Python2.8k 44 606503
efficientvit
Efficient vision foundation models for high-resolution generation and perception.
Language:Python3.1k 39 160214
llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python3.2k 25 208271
once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
Language:Python1.9k 52 75345
proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Language:C++1.4k 69 0287
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.5k 21 95181
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Language:Python7k 67 87389
temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language:Python2.1k 41 220423
torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Language:Jupyter Notebook1.5k 30 123217
torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Language:Cuda1.4k 15 269175

MIT HAN Lab's Repositories

mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python3.2k 25 208271
mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.
Language:Python3.1k 39 160214
mit-han-lab/bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
Language:Python2.8k 44 606503
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
Language:Python2.1k 41 220423
mit-han-lab/torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
Language:Jupyter Notebook1.5k 30 123217
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.5k 21 95181
mit-han-lab/proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
Language:C++1.4k 69 0287
mit-han-lab/torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
Language:Cuda1.4k 15 269175
mit-han-lab/data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
Language:Python1.3k 17 99176
mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Language:Cuda1.3k 22 24280
mit-han-lab/tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
Language:C889 23 79150
mit-han-lab/omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Language:C++752 12 5151
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Language:Python704 9 2831
mit-han-lab/fastcomposer
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Language:Python699 19 3339
mit-han-lab/hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Language:Python631 6 1841
mit-han-lab/spvnas
[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
Language:Python613 21 100115
mit-han-lab/lite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attention
Language:Python608 18 3980
mit-han-lab/duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Language:Python490 9 1733
mit-han-lab/ComfyUI-nunchaku
Language:Python439 4 238
mit-han-lab/deepcompressor
Model Compression Toolbox for Large Language Models and Diffusion Models
Language:Python411 11 6234
mit-han-lab/hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Language:Python335 13 1752
mit-han-lab/Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
Language:C++296 3 615
mit-han-lab/vila-u
[ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Language:Python271 3 188
mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Language:Cuda269 3 1828
mit-han-lab/x-attention
XAttention: Block Sparse Attention with Antidiagonal Scoring
Language:Python1344
mit-han-lab/spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Language:Scala102 9 110
mit-han-lab/patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
Language:Python85 7 26
mit-han-lab/tinychat-tutorial
Language:C++62 6 423
mit-han-lab/VisCompare
A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders
Language:Python22 4 02
mit-han-lab/sparserefine
[ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Language:Python111