fast-inference

There are 10 repositories under fast-inference topic.

  • foolwood/pytorch-slimming

    Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

    Language:Python57091696
  • aredden/flux-fp8-api

    Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

    Language:Python27653334
  • kssteven418/BigLittleDecoder

    [NeurIPS'23] Speculative Decoding with Big Little Decoder

    Language:Python945411
  • dvlab-research/Q-LLM

    This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

    Language:Python54044
  • romsto/Speculative-Decoding

    Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

    Language:Python45228
  • lim142857/Sparsifiner

    Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

    Language:Python14102
  • Academich/translation-transformer

    An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding

    Language:Python5150
  • szemenyeim/RoboDNN

    Fast Forward-Only Deep Neural Network Library for the Nao Robots

    Language:C++5201
  • u-hyszk/japanese-speculative-decoding

    Verification of the effect of speculative decoding in Japanese.

    Language:Python2200
  • PopoDev/BiLD

    Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

    Language:Python1300