TorchMoE/MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

PythonApache-2.0

Issues

Question: Support for Continuous Batching and Asynchronous Requests
#25 opened 5 months ago by Msiavashi
1
Can the MoE-Infinity framework be used in conjunction with the vLLM framework?
#23 opened 6 months ago by alphabewitch
1
run on the mutiple gpus
#15 opened 7 months ago by YLSnowy
3
CPU memory problem when using gptq quantization
#28 opened 3 months ago by JustQJ
0
RuntimeError: CUDA error: invalid device ordinal. When I run script.py, I meet the error below.
#27 opened 4 months ago by Tingberer
2
Readme Example not working (MemoryError: std::bad_alloc)
#26 opened 5 months ago by akhauriyash
1
CUDA extension not installed Error while running readme_example.py
#24 opened 5 months ago by Msiavashi
4
Output of Mixtral-8*7b is strange
#16 opened 7 months ago by JustQJ
2
TODO for first release
#1 opened 10 months ago by drunkcoding
0
How to Install it?
#10 opened 7 months ago by MSGitt
2
Install from pip failed
#11 opened 7 months ago by future-xy
2
Grok-1 Support
#8 opened 8 months ago by drunkcoding
0
MoE-Infinity API Proposal
#2 opened 8 months ago by drunkcoding
1
Support Constrained Server Memory
#5 opened 9 months ago by drunkcoding
0