sgl-project/sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
PythonApache-2.0
Pinned issues
Issues
- 3
[BUG] srt throws KeyError when sgl.gen(...) regex parameter contains Chinese characters
#377 opened by m0g1cian - 0
Type of self.variables
#455 opened by ChuyueSun - 1
I can't use the OpenAI endpoint with images?
#443 opened by vedantroy - 0
DBRX not working
#454 opened by Ying1123 - 3
- 0
Trace OpenAI backend usage
#453 opened by Ying1123 - 0
- 1
Regex generation causes 37x lower performance
#450 opened by Gintasz - 2
- 2
Does sglang do automatic batching?
#444 opened by vedantroy - 1
How do I get logprobs
#442 opened by tom-doerr - 2
Llama-3 regex generation can get stuck in infinite generation beyond max_tokens and crash server (reproduction example)
#414 opened by Gintasz - 0
Support for multimodal models
#421 opened by babla9 - 5
vLLM import error
#391 opened by jlin816 - 0
Use `__all__` to explictly control the APIs.
#386 opened by hnyls2002 - 2
Please add Phi3 support
#407 opened by Curiosity007 - 0
- 3
Chat role prefixes for SRT backend broken in latest
#410 opened by qeternity - 1
Support InternVL 1.5
#398 opened by themrzmaster - 8
Logprobs are almost the same for all choices
#388 opened by tom-doerr - 0
LLaVA-v1.6 RuntimeError in llava image encoding
#409 opened by lukashelff - 1
Switch to non gated models
#387 opened by tom-doerr - 0
Choices functionality breaking with images
#408 opened by dexius-ram-depop - 2
- 2
Launch server using local LLaVA checkpoints
#397 opened by lukashelff - 0
[Question] offload kv cache to disk/database ?
#352 opened by s7ev3n - 0
- 0
no batch run when using openai's format for calling.
#404 opened by xjw00654 - 0
How does RadixAttention implements multi-head/multi-query/grouped-query attention.
#402 opened by Griffintaur - 1
- 1
Support Datetime in JSON mode
#400 opened by timothylimyl - 1
Don't get API response when sending images
#357 opened by tom-doerr - 0
with default setting, LoRA has no improvement is normal? 默认设置下,LoRA 完全没效果正常吗?
#396 opened by fisher75 - 1
Loading a BNB 4 bit model + adapter
#374 opened by timothelaborie - 3
- 1
Regenerate benchmark results for latest vLLM
#389 opened by nilesh-c - 0
ImportError: cannot import name 'function' from partially initialized module 'sglang'
#384 opened by lambda7xx - 3
- 0
ImportError: cannot import name 'get_cuda_stream' from 'triton.runtime.jit' In triton-nightly(V100)
#383 opened by nenomigami - 3
VLLM version
#373 opened by eaubin - 0
How does or does sglang support multiple completions / samples given the same prompt?
#379 opened by wenting-zhao - 0
- 0
Loading Chat Template in a more flexible way?
#376 opened by for-just-we - 0
JSON decoding result don't match regex
#371 opened by DouHappy - 2
- 0
- 1
Fails with latest vllm
#362 opened by benglard - 3
About parameter `max_tokens`
#360 opened by for-just-we - 2
How do I instantiate multiple sgl backends?
#359 opened by accupham - 0
Beam Search Support
#353 opened by LiquidGunay