Pinned Repositories
bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
efficientvit
Efficient vision foundation models for high-resolution generation and perception.
llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
once-for-all
[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment
proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
MIT HAN Lab's Repositories
mit-han-lab/streaming-llm
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
mit-han-lab/efficientvit
Efficient vision foundation models for high-resolution generation and perception.
mit-han-lab/bevfusion
[ICRA'23] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
mit-han-lab/temporal-shift-module
[ICCV 2019] TSM: Temporal Shift Module for Efficient Video Understanding
mit-han-lab/proxylessnas
[ICLR 2019] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
mit-han-lab/torchquantum
A PyTorch-based framework for Quantum Classical Simulation, Quantum Machine Learning, Quantum Neural Networks, Parameterized Quantum Circuits with support for easy deployments on real quantum computers.
mit-han-lab/data-efficient-gans
[NeurIPS 2020] Differentiable Augmentation for Data-Efficient GAN Training
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
mit-han-lab/torchsparse
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
mit-han-lab/gan-compression
[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs
mit-han-lab/tinyengine
[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
mit-han-lab/fastcomposer
[IJCV] FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
mit-han-lab/lite-transformer
[ICLR 2020] Lite Transformer with Long-Short Range Attention
mit-han-lab/distrifuser
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
mit-han-lab/spvnas
[ECCV 2020] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
mit-han-lab/duo-attention
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
mit-han-lab/hart
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
mit-han-lab/hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
mit-han-lab/litepose
[CVPR'22] Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation
mit-han-lab/Quest
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
mit-han-lab/vila-u
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
mit-han-lab/lmquant
mit-han-lab/patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
mit-han-lab/spatten
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
mit-han-lab/sparsevit
[CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
mit-han-lab/Block-Sparse-Attention
A sparse attention kernel supporting mix sparse patterns
mit-han-lab/tinychat-tutorial