IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

PythonApache-2.0

Readme
29Issues
663Stargazers
15Watchers

Watchers

dalistarh
IST Austria & Neural Magic
dsoul
eemailme
efrantar
OpenAI
ghchris2021
ilmarkov
jameswu2014
JohnClaw
lin72h
macto94
MichoChan
PDD
ohaijen
Vienna, Austria
Qubitium
ModelCloud.ai
RYG81
Designmate (I) Pvt Ltd
vgoklani
New York, NY

Contact site admin: Geeks.