sgl-project/sglang

[Bug] subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib'

Closed this issue · 2 comments

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
  • 4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
  • 5. Please use English, otherwise it will be closed.

Describe the bug

Traceback (most recent call last):
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 878, in run_tp_server
model_server.exposed_step(recv_reqs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 234, in exposed_step
self.forward_step()
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 250, in forward_step
self.forward_prefill_batch(new_batch)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 489, in forward_prefill_batch
sample_output, logits_output = self.model_runner.forward(
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 579, in forward
return self.forward_extend(batch)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 543, in forward_extend
return self.model.forward(
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 292, in forward
hidden_states = self.model(input_ids, positions, input_metadata, input_embeds)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 257, in forward
hidden_states, residual = layer(
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 209, in forward
hidden_states = self.self_attn(
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/models/qwen2.py", line 158, in forward
attn_output = self.attn(q, k, v, input_metadata)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/layers/radix_attention.py", line 201, in forward
return self.extend_forward(q, k, v, input_metadata)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/layers/radix_attention.py", line 73, in extend_forward_triton
extend_attention_fwd(
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/sglang/srt/layers/extend_attention.py", line 293, in extend_attention_fwd
_fwd_kernel[grid](
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in
return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/jit.py", line 607, in run
device = driver.active.get_current_device()
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/driver.py", line 23, in getattr
self._initialize_obj()
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
self._obj = self._init_fn()
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/driver.py", line 9, in _create_driver
return actives0
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 371, in init
self.utils = CudaUtils() # TODO: make static
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 80, in init
mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/runtime/build.py", line 48, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/adminad/anaconda3/envs/py10/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpx4yubctp/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmpx4yubctp/cuda_utils.cpython-310-x86_64-linux-gnu.so', '-lcuda', '-L/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-I/home/adminad/anaconda3/envs/py10/lib/python3.10/site-packages/triton/backends/nvidia/include', '-I/tmp/tmpx4yubctp', '-I/home/adminad/anaconda3/envs/py10/include/python3.10']' returned non-zero exit status 1.

Reproduction

SGLANG_USE_MODELSCOPE=true python -m sglang.launch_server --model-path qwen/Qwen2-72B-Instruct --port 30000 --tp 8 --mem-fraction-static 0.8 --dtype bfloat16 --disable-cuda-graph --context-length 512 --disable-flashinfer --disable-flashinfer-sampling

Environment

py3.10, sglang 0.2.14

Try to use Docker

docker pull lmsysorg/sglang:latest

It seems that your gcc version is too old to run SGL. Try to upgrade to GCC11 and retry your script