FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
PythonApache-2.0
Stargazers
- antinucleonFacebook
- AzeirahStudent
- BojanFaleticsearching for projects
- briancpark@Apple
- chhzh123Cornell University
- clearduskKwaiVGI, Kuaishou Tech. << PhD@CASIA
- jiangsy
- johnpaulbin
- jonasmeisner
- josephroccaSingapore
- liu-mengyangSoutheast University
- liuzhuang1024TJNU
- lorenmtMeta
- lygztqbytedance
- MancheryTsinghua University
- merrymercyxAI
- mikelittmanAustin, TX
- mlnvMunich
- mryab
- mufeiliAWS AI Lab Shanghai
- nightlyworker
- nikitavoloboevMadrid
- olliestanleyUnited Kingdom
- PKUFlyingPigPeking University
- Prasanth-BS
- rentainheIDEA
- SandalotsVolcanak
- Subarasheese
- sudoskysUndefined
- SystemclusterAnlatan
- tqchenCMU, OctoML
- trisongzGrowth Engine AI
- verdverm@topicalsource
- xzyaoiETH Zurich / @eth-easl
- Ying1123Stanford University
- zincnodeSamsung