b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
C++MIT
Watchers
- ai13f
- akan
- amsysRas al-Khaimah
- arjn2
- avindraUnited States
- b4rtaz127.0.0.1
- christopherok
- Dashbloxx
- DifferentialityDevelopmentDifferentiality Development
- donfuegoadmin
- eemailme
- EntusiastaIApy
- ghchris2021
- githuba9f5404
- haihecomp
- iackov
- jkeegan
- JohnClaw
- l3rVancouver
- lvzhiqiang
- maxcurrent420
- przemyslawmatrajPoland
- Python-Z
- serralva-ruben
- ShawnsonDFWTheLab.ms
- sterling312
- stevegt
- zhengpeirongHongKong