sgl-project/sglang

Accuracy degrading in concurrent scenario

Opened this issue · 1 comments

Hi, I have tested that when the concurrency is 1, the accuracy is expected. However, when concurrency increases, accuracy degrades. I have checked that no decoding oom happened. From the log, there also seems to have no exception.

The model is qwen2-7b-awq.

@frankxyy

  1. What is your environment? python3 -m sglang.check_env
  2. Do you use a list as input? If so, we fixed a bug in #1199
  3. Try to disable cuda graph padding by adding --disable-cuda-graph-adding
  4. Could you provide a reproducible script?
  5. We check the accuracy of models on each commit in our CI https://github.com/sgl-project/sglang/actions/runs/10539420947/job/29202983543, so it should be able to capture most problems. For your model, it is not tested in CI. If it is important, we can track it on CI as well. You can send a PR to add it to our CI.