inference-acceleration
There are 8 repositories under inference-acceleration topic.
czg1225/AsyncDiff
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
autonomi-ai/nos
⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
jagennath-hari/DepthStream-Accelerator-ROS2-Integrated-Monocular-Depth-Inference
DepthStream Accelerator: A TensorRT-optimized monocular depth estimation tool with ROS2 integration for C++. It offers high-speed, accurate depth perception, perfect for real-time applications in robotics, autonomous vehicles, and interactive 3D environments.
marty1885/scirknn
Convert and run scikit-learn MLPs on Rockchip NPU.
fangvv/TLEE
Code for paper "TLEE: Temporal-wise and Layer-wise Early Exiting Network for Efficient Video Recognition on Edge Devices"
Bisonai/ncnn
Modified inference engine for quantized convolution using product quantization
fangvv/MTACP
Code for paper "Deep Reinforcement Learning based Multi-task Automated Channel Pruning for DNNs"