fast-inference
There are 10 repositories under fast-inference topic.
foolwood/pytorch-slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
lim142857/Sparsifiner
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Academich/translation-transformer
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
szemenyeim/RoboDNN
Fast Forward-Only Deep Neural Network Library for the Nao Robots
u-hyszk/japanese-speculative-decoding
Verification of the effect of speculative decoding in Japanese.
PopoDev/BiLD
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder