inference-acceleration

There are 8 repositories under inference-acceleration topic.

  • czg1225/AsyncDiff

    [NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

    Language:Python146488
  • autonomi-ai/nos

    ⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

    Language:Python126117810
  • dvlab-research/Q-LLM

    This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

    Language:Python36131
  • jagennath-hari/DepthStream-Accelerator-ROS2-Integrated-Monocular-Depth-Inference

    DepthStream Accelerator: A TensorRT-optimized monocular depth estimation tool with ROS2 integration for C++. It offers high-speed, accurate depth perception, perfect for real-time applications in robotics, autonomous vehicles, and interactive 3D environments.

    Language:Jupyter Notebook13110
  • marty1885/scirknn

    Convert and run scikit-learn MLPs on Rockchip NPU.

    Language:Python7401
  • fangvv/TLEE

    Code for paper "TLEE: Temporal-wise and Layer-wise Early Exiting Network for Efficient Video Recognition on Edge Devices"

    Language:Python520
  • Bisonai/ncnn

    Modified inference engine for quantized convolution using product quantization

    Language:C++420
  • fangvv/MTACP

    Code for paper "Deep Reinforcement Learning based Multi-task Automated Channel Pruning for DNNs"

    Language:Python230