sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
PythonApache-2.0
Pinned issues
Issues
- 9
[Bug] Server crashes after loading (Mixtral 8x7b) on L4
#1191 opened by nivibilla - 10
[Feature] Context Caching
#1248 opened by RonanKMcGovern - 1
[Bug] sglang.launch_server error
#1275 opened by andyluo7 - 7
[Bug] RuntimeError in ModelTpServer
#1323 opened by Lzhang-hub - 0
[Feature] support nightly eval
#1272 opened by zhyncs - 3
- 7
[Bug] T4 Crash
#1325 opened by Abdulhanan535 - 4
[Bug] Using 8 H20 GPUs, the deepseek-coder-v2-fp8 starts up normally, but there is no response to client requests.
#1329 opened by fengyang95 - 6
[Bug] Lower single request speed with mla enabled
#1264 opened by halexan - 12
[Bug] Unable to fix model output
#1316 opened by cherishhh - 1
[Bug] gen with regex: Token fusion between input and output, try to avoid this by removing the space at the end of the input.
#1312 opened by alanxmay - 4
[Bug] Facing Error When starting.
#1321 opened by Abdulhanan535 - 1
- 6
[Feature] support smooth-quant?
#1322 opened by Lzhang-hub - 8
- 3
[Bug] Device-side assert triggered in logits processor when running Llama 3.1 70B
#1274 opened by hrukalive - 5
- 4
[Bug] Bad outputs with fp8 quantization at high RPS
#1195 opened by siddhatiwari - 2
[Bug] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib'
#1240 opened by ArtificialZeng - 4
[Bug] get jammed when deploy Qwen2-72b :UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
#1238 opened by ArtificialZeng - 1
- 0
[Feature] Support phi-3 model
#1283 opened by ByronHsu - 3
[Bug] A100 PCIE torch compile error
#1301 opened by zhyncs - 0
- 0
[Feature] Correctness test for Triton kernels
#1292 opened by ByronHsu - 5
[Feature] add option to use liger triton kernel
#1216 opened by binarycrayon - 0
[Feature] support long context eval and benchmark
#1273 opened by zhyncs - 0
[Feature] support ultravox
#1271 opened by zhyncs - 1
[Bug] OpenAI Compatible Prompt Template Error
#1265 opened by BabyChouSr - 4
- 0
[Bug] cannot set --load-format=dummy with vllm 0.5.5
#1259 opened by lxww302 - 3
[Bug] incorrect input_tokens_logprob slicing in RuntimeEndpoint.select method
#1257 opened by jeffrey-fong - 2
[Bug] Error in loading Qwen2-57B-A14B-Instruct
#1251 opened by LucienShui - 5
- 1
Accuracy degrading in concurrent scenario
#1203 opened by frankxyy - 0
- 1
[Bug] AttributeError: 'ScheduleBatch' object has no attribute 'sample' WHEN I DO Benchmarking
#1241 opened by ArtificialZeng - 1
[Feature] Use Embedding/Generation Model to get its Generation/Emebedding
#1200 opened by zhaochenyang20 - 2
[Bug] Empty `top_logprobs` in LogProbs Output for Meta-Llama-3.1-8B-Instruct Model when Using OpenAI Compatible API
#1176 opened by GuanghaoYe - 2
- 1
[Feature] Jamba 1.5 Support PLS
#1190 opened by nivibilla - 1
[Bug] enable-torch-compile error
#1196 opened by siddhatiwari - 0
- 1
No such file or directory: '/sbin/ldconfig'
#1226 opened by zwc163 - 0
- 4
[Bug] vllm updated its get_model function
#1183 opened by zhaochenyang20 - 1
[Feature] Repeated generation expression
#1175 opened by laurens-gs - 1
[Help wanted] Does RadixAttention have anything to do with attention?
#1181 opened by Wanglongzhi2001 - 1
[Bug] Runtime Stuck
#1173 opened by Ricardokevins - 1