NVIDIA/TensorRT-LLM

[Doc]: Failed to parse the arguments for the LLM constructor: _TrtLLM got invalid argument: disable_overlap_scheduler

Closed this issue ยท 5 comments

๐Ÿ“š The doc issue

CUDA_VISIBLE_DEVICES=0,1,2,3
trtllm-serve /mnt/model/DeepSeek-R1-Distill-Qwen-32B
--tp_size 4
--trust_remote_code
--kv_cache_free_gpu_memory_fraction 0.9
--host localhost --port 8001
--extra_llm_api_options ./ctx_extra-llm-api-config.yaml

[2025-09-16 03:59:51] INFO config.py:54: PyTorch version 2.8.0a0+5228986c39.nv25.5 available.
[2025-09-16 03:59:51] INFO config.py:66: Polars version 1.25.2 available.
2025-09-16 03:59:57,726 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
[TensorRT-LLM] TensorRT-LLM version: 1.0.0rc4
[09/16/2025-03:59:59] [TRT-LLM] [E] Failed to parse the arguments for the LLM constructor: _TrtLLM got invalid argument: disable_overlap_scheduler
Traceback (most recent call last):
File "/usr/local/bin/trtllm-serve", line 8, in
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/serve.py", line 302, in serve
launch_server(host, port, llm_args, metadata_server_cfg, server_role)
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/commands/serve.py", line 145, in launch_server
llm = LLM(**llm_args)
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 722, in init
super().init(model, tokenizer, tokenizer_mode, skip_tokenizer_init,
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 160, in init
raise e
File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/llmapi/llm.py", line 141, in init
raise ValueError(
ValueError: _TrtLLM got invalid argument: disable_overlap_scheduler

Suggest a potential alternative/fix

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

disable_overlap_scheduler is only available for the PyTorch backend. Can you check if you have backend: trt in your ctx_extra-llm-api-config.yaml?

CUDA_VISIBLE_DEVICES=0,1,2,3 trtllm-serve /mnt/model/DeepSeek-R1-Distill-Qwen-32B --tp_size 4 --trust_remote_code --kv_cache_free_gpu_memory_fraction 0.9 --host localhost --port 8001 --extra_llm_api_options ./ctx_extra-llm-api-config.yaml --backend pytorch

adding "--backend pytorch" is OK

ctx_extra-llm-api-config.yaml

#The overlap scheduler for context servers is currently disabled, as it is not yet supported in disaggregated context server architectures.
disable_overlap_scheduler: True
cache_transceiver_config:
backend: default
max_tokens_in_buffer: 2048

Issue has not received an update in over 14 days. Adding stale label.

This issue was closed because it has been 14 days without activity since it has been marked as stale.