IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
PythonApache-2.0
Stargazers
- abacajsoftware eng building things
- banda-largaContents
- BlueKiji77
- buaabaiBUAA
- catidAustin, TX
- danthe3rd
- data-pandaEindhoven University of Technology
- DominikScholz
- efrantarOpenAI
- erichoceanXy Group Ltd
- fishelegs
- gm8xx8
- hoagy-davis-digges
- jph00@answerdotai
- Ki6an
- kmchitiMila
- kyegomezSwarms
- liujingcsMonash University
- matiasEDCórdoba, Argentina
- mgoin@neuralmagic
- nateraw@huggingface
- pabl-o-ce
- pamolchanovNVIDIA
- Rane2021
- robertgshaw2-neuralmagic@neuralmagic
- Ryu1845
- SandalotsVolcanak
- ShawonAshrafellamind GmbH
- specblades
- SunMarcHugging Face
- tokestermwCresta
- u-brixton
- vgoklaniNew York, NY
- vidalmaxime
- xzyaoiETH Zurich / @eth-easl
- zhuohan123UC Berkeley