microsoft/DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

PythonApache-2.0

Issues

[BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request`
#489 opened 7 days ago by weiqisun
1
Cannot run Yi-34B-Chat => ValueError: Unsupported q_ratio: 7
#472 opened a month ago by joeking11829
1
support stream
#488 opened 10 days ago by ZZhangxian
0
support Qwen1.5
#487 opened 10 days ago by ZZhangxian
0
support Qwen
#486 opened 10 days ago by ZZhangxian
0
Support LLava next stronger
#483 opened 24 days ago by thesby
0
How can I use the same prompt to produce the same text output as vllm
#482 opened 25 days ago by Greatpanc
0
Tf32 support
#481 opened a month ago by Chasapas
0
DeepSpeed-MII 能加载量化的int4或者int8的模型吗？
#479 opened a month ago by wangyongpenga
0
Does deepspeed-mii support prefix_allowed_tokens_fn?
#477 opened a month ago by zcakzhuu
0
qwen1.5 model Support?
#442 opened 3 months ago by musexiaoluo
3
Is openai compatible server still working?
#459 opened 2 months ago by RobinQu
1
[REQUEST] LLAMA-3 support
#475 opened a month ago by MRYingLEE
0
[REQUEST] Mixtral-8x22B support
#474 opened a month ago by y-live-koba
0
How can i use this library with langchain or llama_index?
#450 opened 2 months ago by risedangel
2
BUG in run_batch_processing
#471 opened a month ago by zhihui96
0
[FEATURE] Access to logits and final hidden layer
#463 opened 2 months ago by lshamis
1
How do I launch the api on a graphics card other than cuda: 0
#460 opened 2 months ago by Stark-zheng
1
ValueError: Unsupported model type phi3
#469 opened a month ago by abpani
0
Block when Call client inference in multiprocessing.Process
#449 opened 2 months ago by zhaotyer
3
error when using Qwen1.5-32B
#468 opened a month ago by puppet101
0
Performance with vllm
#467 opened 2 months ago by littletomatodonkey
0
[Problem]errno: 98 - Address already in use
#466 opened 2 months ago by littletomatodonkey
0
Only running one replica even though setting many replicas
#465 opened 2 months ago by thesby
0
RuntimeError: The server socket has failed to listen on any local network address
#464 opened 2 months ago by thesby
1
Can DeepSpeed-MII inference on multi gpus with only 1 replica?
#435 opened 3 months ago by gujingit
2
How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ？
#462 opened 2 months ago by wenyangchou
0
[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII
#457 opened 2 months ago by freQuensy23-coder
1
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
#452 opened 2 months ago by Andronixs
6
Add support for Gemma models
#425 opened 3 months ago by lullabies777
1
how can I use deepspeed to split the model to submit GPU?
#458 opened 2 months ago by WanBenLe
0
Cohere's Command-R model support
#446 opened 3 months ago by gottlike
1
Workarounds for pre-Ampere devices
#437 opened 2 months ago by jinhachung
1
Add support for DBRX
#455 opened 2 months ago by azaccor
0
Any plans for produnction-ready services?
#454 opened 2 months ago by SeungminHeo
0
Limit VRAM usage in serving the model
#453 opened 2 months ago by risedangel
2
[BUG] Issue serving Mixtral 8x7B on H100
#443 opened 3 months ago by Rogerwyf
7
Quantization inference
#439 opened 3 months ago by freQuensy23-coder
2
I can't tell from documentation if we're meant to use a chat template or if it's automatically implemented?
#448 opened 3 months ago by sidagarwal2805
0
[NEED HELP] Quantization inference
#440 opened 3 months ago by freQuensy23-coder
3
ValueError: Unsupported model type roberta
#432 opened 3 months ago by pradeepdev-1995
2
How to use DeepSpeed-MII to deploy a LLM model from DeepSpeed/Megatron-DeepSpeed trained checkpoints?
#430 opened 3 months ago by Jye-525
2
On M3 Pro Macbook having issues with installation
#441 opened 3 months ago by HariKunapareddy
2
server crashed for some reason, unable to proceed
#444 opened 3 months ago by Archmilio
1
Requests.exceptions.ConnectionError:
#427 opened 3 months ago by Weigaa
2
What is the exact meaning of forward tokens?
#438 opened 3 months ago by frankxyy
0
Kernel execution error with long context length
#436 opened 3 months ago by qiangxu1996
0
MII Example shows that mii is "Slower" than Baseline!
#431 opened 3 months ago by Weigaa
0
Speeding up loading in inference checkpoints
#426 opened 3 months ago by amritap-ef
2
When I start server, after loading model, I got an error of 'grpc.aio._call.AioRpcError'
#424 opened 3 months ago by zzz0906
5