inference-acceleration

There are 8 repositories under inference-acceleration topic.

czg1225/AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Language:Python146 4 88
autonomi-ai/nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
Language:Python126 1 17810
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Language:Python36 1 31
jagennath-hari/DepthStream-Accelerator-ROS2-Integrated-Monocular-Depth-Inference
DepthStream Accelerator: A TensorRT-optimized monocular depth estimation tool with ROS2 integration for C++. It offers high-speed, accurate depth perception, perfect for real-time applications in robotics, autonomous vehicles, and interactive 3D environments.
Language:Jupyter Notebook13 1 10
marty1885/scirknn
Convert and run scikit-learn MLPs on Rockchip NPU.
Language:Python7 4 01
fangvv/TLEE
Code for paper "TLEE: Temporal-wise and Layer-wise Early Exiting Network for Efficient Video Recognition on Edge Devices"
Language:Python5 2 0
Bisonai/ncnn
Modified inference engine for quantized convolution using product quantization
Language:C++4 2 0
fangvv/MTACP
Code for paper "Deep Reinforcement Learning based Multi-task Automated Channel Pruning for DNNs"
Language:Python2 3 0