microsoft/DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
PythonApache-2.0
Issues
- 1
[BUG] MII Backend Hangs After 9999 Exceptions in `MIIAsyncPipeline.put_request`
#489 opened by weiqisun - 1
- 0
support stream
#488 opened by ZZhangxian - 0
support Qwen1.5
#487 opened by ZZhangxian - 0
support Qwen
#486 opened by ZZhangxian - 0
Support LLava next stronger
#483 opened by thesby - 0
- 0
Tf32 support
#481 opened by Chasapas - 0
DeepSpeed-MII 能加载量化的int4或者int8的模型吗?
#479 opened by wangyongpenga - 0
Does deepspeed-mii support prefix_allowed_tokens_fn?
#477 opened by zcakzhuu - 3
qwen1.5 model Support?
#442 opened by musexiaoluo - 1
Is openai compatible server still working?
#459 opened by RobinQu - 0
[REQUEST] LLAMA-3 support
#475 opened by MRYingLEE - 0
[REQUEST] Mixtral-8x22B support
#474 opened by y-live-koba - 2
- 0
BUG in run_batch_processing
#471 opened by zhihui96 - 1
[FEATURE] Access to logits and final hidden layer
#463 opened by lshamis - 1
- 0
ValueError: Unsupported model type phi3
#469 opened by abpani - 3
- 0
error when using Qwen1.5-32B
#468 opened by puppet101 - 0
Performance with vllm
#467 opened by littletomatodonkey - 0
- 0
- 1
RuntimeError: The server socket has failed to listen on any local network address
#464 opened by thesby - 2
- 0
How is the prompt segmentation specifically implemented for Dynamic SplitFuse? Is there any code implement or code snippet ?
#462 opened by wenyangchou - 1
[FEATURE REQUEST] Add Support for Qwen1.5-MoE Architecture in DeepSpeed-MII
#457 opened by freQuensy23-coder - 6
inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii
#452 opened by Andronixs - 1
Add support for Gemma models
#425 opened by lullabies777 - 0
- 1
Cohere's Command-R model support
#446 opened by gottlike - 1
Workarounds for pre-Ampere devices
#437 opened by jinhachung - 0
Add support for DBRX
#455 opened by azaccor - 0
Any plans for produnction-ready services?
#454 opened by SeungminHeo - 2
Limit VRAM usage in serving the model
#453 opened by risedangel - 7
[BUG] Issue serving Mixtral 8x7B on H100
#443 opened by Rogerwyf - 2
Quantization inference
#439 opened by freQuensy23-coder - 0
I can't tell from documentation if we're meant to use a chat template or if it's automatically implemented?
#448 opened by sidagarwal2805 - 3
[NEED HELP] Quantization inference
#440 opened by freQuensy23-coder - 2
ValueError: Unsupported model type roberta
#432 opened by pradeepdev-1995 - 2
How to use DeepSpeed-MII to deploy a LLM model from DeepSpeed/Megatron-DeepSpeed trained checkpoints?
#430 opened by Jye-525 - 2
- 1
server crashed for some reason, unable to proceed
#444 opened by Archmilio - 2
Requests.exceptions.ConnectionError:
#427 opened by Weigaa - 0
What is the exact meaning of forward tokens?
#438 opened by frankxyy - 0
Kernel execution error with long context length
#436 opened by qiangxu1996 - 0
MII Example shows that mii is "Slower" than Baseline!
#431 opened by Weigaa - 2
Speeding up loading in inference checkpoints
#426 opened by amritap-ef - 5
When I start server, after loading model, I got an error of 'grpc.aio._call.AioRpcError'
#424 opened by zzz0906