Accuracy degrading in concurrent scenario

Question

Accuracy degrading in concurrent scenario

Opened this issue 22 days ago · 1 comments

Hi, I have tested that when the concurrency is 1, the accuracy is expected. However, when concurrency increases, accuracy degrades. I have checked that no decoding oom happened. From the log, there also seems to have no exception.

The model is qwen2-7b-awq.

Answer 1 · 2024-08-25T04:47:39.000Z

@frankxyy

What is your environment? python3 -m sglang.check_env
Do you use a list as input? If so, we fixed a bug in #1199
Try to disable cuda graph padding by adding --disable-cuda-graph-adding
Could you provide a reproducible script?
We check the accuracy of models on each commit in our CI https://github.com/sgl-project/sglang/actions/runs/10539420947/job/29202983543, so it should be able to capture most problems. For your model, it is not tested in CI. If it is important, we can track it on CI as well. You can send a PR to add it to our CI.