efficient-inference
There are 59 repositories under efficient-inference topic.
huawei-noah/Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
snap-research/EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022]
huawei-noah/AdderNet
Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"
horseee/DeepCache
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
liuzhuang13/slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
VITA-Group/LightGaussian
"LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang
Zhen-Dong/Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
The-Learning-And-Vision-Atelier-LAVA/SMSR
[CVPR 2021] Exploring Sparsity in Image Super-Resolution for Efficient Inference
changlin31/DS-Net
(CVPR 2021, Oral) Dynamic Slimmable Network
SqueezeAILab/KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
liuziwei7/mobile-id
Deep Face Model Compression
xindongzhang/ELAN
[ECCV2022] Efficient Long-Range Attention Network for Image Super-resolution
cure-lab/DeciWatch
[ECCV 2022] Official implementation of the paper "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"
lucidrains/speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
Picovoice/picollm
On-device LLM Inference Powered by X-Bit Quantization
RAIVNLab/STR
Soft Threshold Weight Reparameterization for Learnable Sparsity
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
snap-research/graphless-neural-networks
[ICLR 2022] Code for Graph-less Neural Networks: Teaching Old MLPs New Tricks via Distillation (GLNN)
FranxYao/Partially-Observed-TreeCRFs
Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
IBM/AdaMML
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
tchittesh/lzu
Code for Learning to Zoom and Unzoom (CVPR 2023)
raymin0223/fast_robust_early_exit
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)
ivclab/agegenderLMTCNN
Jia-Hong Lee, Yi-Ming Chan, Ting-Yen Chen, and Chu-Song Chen, "Joint Estimation of Age and Gender from Unconstrained Face Images using Lightweight Multi-task CNN for Mobile Applications," IEEE International Conference on Multimedia Information Processing and Retrieval, MIPR 2018
yikaiw/RS-Nets
[ECCV 2020] Code release for "Resolution Switchable Networks for Runtime Efficient Image Recognition"
bharathsudharsan/TinyML-Benchmark-NNs-on-MCUs
Code for WF-IoT paper 'TinyML Benchmark: Executing Fully Connected Neural Networks on Commodity Microcontrollers'
linksense/EfficientNet.PyTorch
Concise, Modular, Human-friendly PyTorch implementation of EfficientNet with Pre-trained Weights.
Zhen-Dong/CoDeNet
[FPGA'21] CoDeNet is an efficient object detection model on PyTorch, with SOTA performance on VOC and COCO based on CenterNet and Co-Designed deformable convolution.
bharathsudharsan/CNN_on_MCU
Code for paper 'Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware'
VITA-Group/triple-wins
[ICLR 2020] ”Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference“
ivclab/NeuralMerger
Yi-Min Chou, Yi-Ming Chan, Jia-Hong Lee, Chih-Yi Chiu, Chu-Song Chen, "Unifying and Merging Well-trained Deep Neural Networks for Inference Stage," International Joint Conference on Artificial Intelligence (IJCAI), 2018
xternalz/SDPoint
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
ivclab/Multistage_Pruning
Cheng-Hao Tu, Jia-Hong Lee, Yi-Ming Chan and Chu-Song Chen, "Pruning Depthwise Separable Convolutions for MobileNet Compression," International Joint Conference on Neural Networks, IJCNN 2020, July 2020.
snap-research/linkless-link-prediction
[ICML 2023] Linkless Link Prediction via Relational Distillation
IBM/AutoVP
[ICLR24] AutoVP: An Automated Visual Prompting Framework and Benchmark