FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
PythonApache-2.0
Stargazers
- antinucleonFacebook
- AzeirahStudent
- BojanFaleticsearching for projects
- briancparkNCSU
- chhzh123Cornell University
- clearduskY-tech, Kwai Inc. << PhD@CASIA
- jiangsy
- johnpaulbin
- jonasmeisner
- josephroccaSingapore
- liu-mengyangSoutheast University
- liuzhuang1024TJNU
- lorenmtImperial College London
- lygztqbytedance
- MancheryTsinghua University
- merrymercyUC Berkeley
- mikelittmanAustin, TX
- mlnvMunich
- mryab
- mufeiliAWS AI Lab Shanghai
- nightlyworker
- nikitavoloboevTbilisi
- olliestanleyUnited Kingdom
- PKUFlyingPigPeking University
- Prasanth-BS
- rentainheIDEA
- SandalotsVolcanak
- Subarasheese
- sudoskysSongur Studio
- SystemclusterAnlatan
- tqchenCMU, OctoML
- trisongzGrowth Engine AI
- verdverm@topicalsource
- xzyaoiETH Zurich / @eth-easl
- Ying1123Stanford University
- zincnodeSamsung