fast-inference

There are 10 repositories under fast-inference topic.

foolwood/pytorch-slimming
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Language:Python570 9 1696
aredden/flux-fp8-api
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
Language:Python276 5 3334
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Language:Python94 5 411
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Language:Python54 0 44
romsto/Speculative-Decoding
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Language:Python45 2 28
lim142857/Sparsifiner
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Language:Python14 1 02
Academich/translation-transformer
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Language:Python5 1 50
szemenyeim/RoboDNN
Fast Forward-Only Deep Neural Network Library for the Nao Robots
Language:C++5 2 01
u-hyszk/japanese-speculative-decoding
Verification of the effect of speculative decoding in Japanese.
Language:Python2 2 00
PopoDev/BiLD
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Language:Python1 3 00

fast-inference

foolwood/pytorch-slimming

aredden/flux-fp8-api

kssteven418/BigLittleDecoder

dvlab-research/Q-LLM

romsto/Speculative-Decoding

lim142857/Sparsifiner

Academich/translation-transformer

szemenyeim/RoboDNN

u-hyszk/japanese-speculative-decoding

PopoDev/BiLD