b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
C++MIT
Stargazers
- abacajsoftware eng building things
- akhilcacharyaCambridge, MA
- aksssonamrao
- alexl83
- amirrezasalimiToolstack
- arch-btw
- BruceMacD@ollama
- bullno1Singapore
- cesarandreslopez
- Curiosity007
- edgan
- ekgUniversity of Tennessee Health Science Center (UTHSC)
- FrischifrischLong-term unemployed
- fungiboletus@SINTEF
- ghchris2021
- gkleinRed Hat
- ground7
- IoTeacherTECNM Campus Instituto Tecnológico de Tijuana
- jddunnNew York City
- jfgonsalvesMonash Health
- joshuafuller
- koumoua01Unknown
- logikstate
- lolxdmainkaisemaanlu
- martinwepner@wepner-tech
- MikeLPair-lab
- paularch0
- rongzhou
- seanjensengreySeattle, WA
- shahizatissai.nu.edu.kz
- SlachAltinity
- the-crypt-keeper
- thomasvilly
- tkersey@thisisartium
- Venkman42
- xhlsa