FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
PythonApache-2.0
Watchers
- bilalghalibGEMSI.org
- dagelfWhere Innovation Serves Humans
- dgiunchiUCL
- dotneetCOMPASS Inc.
- dsoul
- gradetwo
- granin
- ibrahimishagEnsol Biosciences Inc
- inniyahDebian
- ip01
- leeoniyaL6 @grafana
- levigrossNew York
- levitationSimplify / Macrotec LLC
- mbofb
- michalwolsNew York
- nborbitLondon
- Nek@zero-plus-x
- nkmryTokyo, Japan
- nxtreaming
- onexuan
- pjaaskelIntel Corporation & Tampere University
- rozgoVertexStudio
- shafiqalibhaiDeployView Limited
- shantanusharmaSharma Labs
- sheshuguang
- shoheitanakaArtisan Workshop
- Show1SPUTNIKs Organization
- ssergio198Earth
- StavrosDStavros Dimopoulos
- strategist922Microsoft
- take-cheezeAzumino, Japan
- timothyklim
- trycatcher
- vidsystems for people
- xdefRussia, Moscow
- zhuanghTesla