efficient-inference
There are 68 repositories under efficient-inference topic.
huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
snap-research/EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
huawei-noah/AdderNet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
VITA-Group/LightGaussian
[NeurIPS 2024 Spotlight]"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
liuzhuang13/slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
SYSU-SAIL/SMSR
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
xindongzhang/ELAN
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
changlin31/DS-Net
(CVPR 2021, Oral) Dynamic Slimmable Network
liuziwei7/mobile-id
Deep Face Model Compression
czg1225/AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
xuyang-liu16/Awesome-Generation-Acceleration
📚 Collection of awesome generation acceleration resources.
cure-lab/DeciWatch
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
horseee/learning-to-cache
[NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
RAIVNLab/STR
Soft Threshold Weight Reparameterization for Learnable Sparsity
snap-research/graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
Alpha-Innovator/AdaptiveDiffusion
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
qiuk2/AAR
[Official Implementation] Acoustic Autoregressive Modeling 🔥
raymin0223/fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
FranxYao/Partially-Observed-TreeCRFs
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
IBM/AdaMML
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
tchittesh/lzu
Code for Learning to Zoom and Unzoom (CVPR 2023)
ivclab/agegenderLMTCNN
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
yikaiw/RS-Nets
[ECCV 2020] Code release for "Resolution Switchable Networks for Runtime Efficient Image Recognition"
bharathsudharsan/TinyML-Benchmark-NNs-on-MCUs
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
linksense/EfficientNet.PyTorch
Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.
snu-mllab/LayerMerge
Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)
Zhen-Dong/CoDeNet
[FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Designed deformable convolution.
bharathsudharsan/CNN_on_MCU
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'