sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

PythonApache-2.0

Pinned issues

Development Roadmap (2024 Q3)

#634 opened 2 months ago by Ying1123

Open12

[Tracker] OpenRouter LLM rankings tracking

#1152 opened 17 days ago by zhyncs

Open2

Issues

[Bug] Server crashes after loading (Mixtral 8x7b) on L4
#1191 opened 13 days ago by nivibilla
9
[Feature] Context Caching
#1248 opened 8 days ago by RonanKMcGovern
10
[Bug] sglang.launch_server error
#1275 opened 5 days ago by andyluo7
1
[Bug] RuntimeError in ModelTpServer
#1323 opened a day ago by Lzhang-hub
7
[Feature] support nightly eval
#1272 opened a day ago by zhyncs
0
[Bug] Unsupported architectures: ChatGLMForConditionalGeneration.
#1331 opened a day ago by maxin9966
3
[Bug] T4 Crash
#1325 opened a day ago by Abdulhanan535
7
[Bug] Using 8 H20 GPUs, the deepseek-coder-v2-fp8 starts up normally, but there is no response to client requests.
#1329 opened a day ago by fengyang95
4
[Bug] Lower single request speed with mla enabled
#1264 opened 6 days ago by halexan
6
[Bug] Unable to fix model output
#1316 opened 2 days ago by cherishhh
12
[Bug] gen with regex: Token fusion between input and output, try to avoid this by removing the space at the end of the input.
#1312 opened 2 days ago by alanxmay
1
[Bug] Facing Error When starting.
#1321 opened a day ago by Abdulhanan535
4
The CPU is also occupied at 100% when there are no requests.
#1315 opened a day ago by luhairong11
1
[Feature] support smooth-quant?
#1322 opened a day ago by Lzhang-hub
6
[Bug] sglang run for few hours, it will stop returning valid response
#1270 opened 5 days ago by liho00
8
[Bug] Device-side assert triggered in logits processor when running Llama 3.1 70B
#1274 opened 2 days ago by hrukalive
3
[Bug] device-side assert triggered when using run_batch
#1279 opened 2 days ago by stikkireddy
5
[Bug] Bad outputs with fp8 quantization at high RPS
#1195 opened 12 days ago by siddhatiwari
4
[Bug] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib'
#1240 opened 8 days ago by ArtificialZeng
2
[Bug] get jammed when deploy Qwen2-72b ：UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '
#1238 opened 8 days ago by ArtificialZeng
4
[Bug] Update to 0.2.15 and torch compile leads to error
#1310 opened 2 days ago by zhaochenyang20
1
[Feature] Support phi-3 model
#1283 opened 2 days ago by ByronHsu
0
[Bug] A100 PCIE torch compile error
#1301 opened 2 days ago by zhyncs
3
Do you support frontend-language inference for Llava-OneVision ?
#1302 opened 3 days ago by ehayeshaiper
0
[Feature] Correctness test for Triton kernels
#1292 opened 3 days ago by ByronHsu
0
[Feature] add option to use liger triton kernel
#1216 opened 4 days ago by binarycrayon
5
[Feature] support long context eval and benchmark
#1273 opened 5 days ago by zhyncs
0
[Feature] support ultravox
#1271 opened 5 days ago by zhyncs
0
[Bug] OpenAI Compatible Prompt Template Error
#1265 opened 5 days ago by BabyChouSr
1
[Bug] Why sglang is slower than vllm on ShareGPT datasets?
#1268 opened 6 days ago by lullabies777
4
[Bug] cannot set --load-format=dummy with vllm 0.5.5
#1259 opened 6 days ago by lxww302
0
[Bug] incorrect input_tokens_logprob slicing in RuntimeEndpoint.select method
#1257 opened 7 days ago by jeffrey-fong
3
[Bug] Error in loading Qwen2-57B-A14B-Instruct
#1251 opened 7 days ago by LucienShui
2
[Bug] Dynamic FP8 quantization fails due to incorrect tensor shape
#1178 opened 7 days ago by qeternity
5
Accuracy degrading in concurrent scenario
#1203 opened 11 days ago by frankxyy
1
[Bug] 0.2.14 version. ValueError: malformed node or string: None
#1245 opened 8 days ago by lss15151161
0
[Bug] AttributeError: 'ScheduleBatch' object has no attribute 'sample' WHEN I DO Benchmarking
#1241 opened 8 days ago by ArtificialZeng
1
[Feature] Use Embedding/Generation Model to get its Generation/Emebedding
#1200 opened 8 days ago by zhaochenyang20
1
[Bug] Empty `top_logprobs` in LogProbs Output for Meta-Llama-3.1-8B-Instruct Model when Using OpenAI Compatible API
#1176 opened 14 days ago by GuanghaoYe
2
[Bug] schedule_batch.py: IndexError: list index out of range
#1189 opened 13 days ago by Quang-elec44
2
[Feature] Jamba 1.5 Support PLS
#1190 opened 13 days ago by nivibilla
1
[Bug] enable-torch-compile error
#1196 opened 11 days ago by siddhatiwari
1
Some questions about TTFT and TPOT benchmarks
#1228 opened 9 days ago by sitabulaixizawaluduo
0
No such file or directory: '/sbin/ldconfig'
#1226 opened 9 days ago by zwc163
1
在A6000上启动，14bqwen1.5，发现有问题，多GPU启动，只能用1张卡或者2张卡，如果设置3,4,5,6会报错，
#1220 opened 10 days ago by yawzhe
0
[Bug] vllm updated its get_model function
#1183 opened 10 days ago by zhaochenyang20
4
[Feature] Repeated generation expression
#1175 opened 15 days ago by laurens-gs
1
[Help wanted] Does RadixAttention have anything to do with attention?
#1181 opened 14 days ago by Wanglongzhi2001
1
[Bug] Runtime Stuck
#1173 opened 15 days ago by Ricardokevins
1
[Feature] SGLang using JSON as template config file needs improve
#1172 opened 15 days ago by zhang001122
1