Issues
- 1
How to calculate the memory read and write for attention calculation in decoding phase?
#49 opened by azamikram - 1
- 2
- 3
Error in building SwiftTransformers (error: more than one conversion function from "half" to a built-in type applies)
#46 opened by lylcyl - 8
Decode Wrong Token
#16 opened by sitabulaixizawaluduo - 1
- 0
- 0
Can distserve change block_size ?
#47 opened by WANGHanshuo1220 - 0
- 11
底层跨group的kv cache传输用的是什么库呢?
#22 opened by CSEEduanyu - 2
模型推理结果混乱,怎么解决。
#40 opened by liweiqing1997 - 0
Adding a new model?
#43 opened by gursimar - 1
not distserve.simulator.utils
#42 opened by TZHelloWorld - 12
decoder.embed_tokens.weight.pt not found
#10 opened by llx-08 - 5
- 0
What does pp_cross mean in the simulator output?
#41 opened by xshqhua - 5
编译SwiftTransformer失败
#37 opened by FredHuang99 - 13
fail to run examples/offline.py , unable to download the model to reproduce
#35 opened by William12github - 3
- 1
分离部署多个prefill实例与多个decode实例支持问题
#27 opened by Lin-Qingyang-Alec - 2
- 2
- 2
- 1
Support autoscaled prefill/decode servers
#28 opened by liurupeng - 4
DistServe是否支持异构推理?
#34 opened by RobertLou - 0
Cmake build fail
#31 opened by hyuenmin-choi - 5
- 7
codellama34b ttft延迟问题
#19 opened by sitabulaixizawaluduo - 2
Swift transformers cmak build 一直循序
#25 opened by lcvcl - 2
Great work!
#20 opened by irasin - 0
Model not loaded error
#24 opened by melissadu-db - 0
- 2
SwitfTransformer compilation fails with ambiguous conversion error at PyTorch 24.05 container.
#21 opened by piotrm-nvidia - 5
Offline.py LLMEngine.__init__() missing 1 required positional argument: 'simulator_config'
#15 opened by fivebamboo694 - 6
一些关于TTFT的问题
#18 opened by YLSnowy - 0
安装环境为什么要conda和pip混着用呢 不能全用pip吗
#17 opened by CSEEduanyu - 8
- 7
How to profile
#13 opened by YLSnowy - 3
- 1
offline.py need simulator_config
#5 opened by YLSnowy - 1
- 1