Qwen2.5-1.5b用LLaMA-factory lora微调后,vLLM加载模型报错,求教~
Closed this issue · 0 comments
Jimmy-L99 commented
Model Series
Qwen2.5
What are the models used?
Qwen2.5-1.5b
What is the scenario where the problem happened?
[vLLM]推理,[LLaMA-factory]进行[LoRA SFT-finetuning]
Is this a known issue?
- I have followed the GitHub README.
- I have checked the Qwen documentation and cannot find an answer there.
- I have checked the documentation of the related framework and cannot find useful information.
- I have searched the issues and there is not a similar one.
Information about environment
- Ubuntu 20.04
- Python 3.11
- GPU A100*2
- CUDA 11.4
- PyTorch 2.4.0+cu121
Log output
*
Description
步骤:
-
1.使用llama-factory+对官方提供yaml修改,LoRA微调得到
checkpoint-180
。 -
2.使用vLLM推理+LoRA适配器:
lora相关部分代码
lora_request = LoRARequest(
lora_name="lora",
lora_int_id=1,
lora_local_path="/root/ljm/LoRA/LoRA_litchi1/qwen_model/keyword_finetune_qwen2.5-1.5b-lora_lr1e-4_r16_alpha32_ld0.05/checkpoint-180"
)
async for output in engine.generate(
prompt=inputs,
sampling_params=sampling_params,
request_id=f"{time.time()}",
lora_request=lora_request,
):
...
engine_args = AsyncEngineArgs(
model=MODEL_PATH,
tokenizer=MODEL_PATH,
tensor_parallel_size=1,
dtype="bfloat16",
trust_remote_code=True,
gpu_memory_utilization=0.2,
enforce_eager=True,
disable_log_requests=True,
enable_lora=True,
)
engine = AsyncLLMEngine.from_engine_args(engine_args)
报错:
(qwen2_5) root@node1:~# python /root/ljm/Qwen2.5/qwen2_5_1.5B_vLLM_api_server.py
/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash:
No module named 'vllm._version'
from vllm.version import __version__ as VLLM_VERSION
WARNING 10-17 11:09:46 config.py:380] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 10-17 11:09:46 llm_engine.py:237] Initializing an LLM engine (vdev) with config: model='/root/ljm/models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='/root/ljm/models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/root/ljm/models/Qwen2.5-1.5B-Instruct, use_v2_block_manager=True, num_scheduler_steps=1, chunked_prefill_enabled=False multi_step_stream_outputs=True, enable_prefix_caching=False, use_async_output_proc=False, use_cached_outputs=False, mm_processor_kwargs=None)
INFO 10-17 11:09:47 model_runner.py:1060] Starting to load model /root/ljm/models/Qwen2.5-1.5B-Instruct...
Loading safetensors checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.17it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.17it/s]
INFO 10-17 11:09:49 model_runner.py:1071] Loading model weights took 2.8875 GB
/usr/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
INFO 10-17 11:09:50 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241017-110950.pkl...
INFO 10-17 11:09:50 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241017-110950.pkl.
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1665, in execute_model
[rank0]: hidden_or_intermediate_states = model_executable(
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 415, in forward
[rank0]: hidden_states = self.model(input_ids, positions, kv_caches,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 288, in forward
[rank0]: hidden_states, residual = layer(
[rank0]: ^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 210, in forward
[rank0]: hidden_states = self.self_attn(
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 154, in forward
[rank0]: qkv, _ = self.qkv_proj(hidden_states)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/layers.py", line 462, in forward
[rank0]: output_parallel = self.apply(input_, bias)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/layers.py", line 860, in apply
[rank0]: self.punica_wrapper.add_lora_packed_nslice(output, x,
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/punica.py", line 577, in add_lora_packed_nslice
[rank0]: self.add_lora(y, x, lora_a_stacked[slice_idx],
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/punica.py", line 545, in add_lora
[rank0]: self.add_shrink(buffer, x, wa_t_all, scale)
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/punica.py", line 467, in add_shrink
[rank0]: shrink_fun(y, x, w_t_all, scale)
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/punica.py", line 372, in shrink_prefill
[rank0]: sgmv_shrink(
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/_library/custom_ops.py", line 506, in __call__
[rank0]: return self._opoverload(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/_ops.py", line 667, in __call__
[rank0]: return self_._op(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/_library/custom_ops.py", line 494, in adinplaceorview_impl
[rank0]: return self._opoverload.redispatch(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/_ops.py", line 672, in redispatch
[rank0]: return self_._handle.redispatch_boxed(keyset, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/_library/custom_ops.py", line 236, in backend_impl
[rank0]: result = self._backend_fns[device_type](*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/lora/ops/sgmv_shrink.py", line 170, in _sgmv_shrink
[rank0]: _sgmv_shrink_kernel[grid](
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/jit.py", line 345, in <lambda>
[rank0]: return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/triton_utils/libentry.py", line 82, in run
[rank0]: kernel = self.fn.run(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/jit.py", line 607, in run
[rank0]: device = driver.active.get_current_device()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/driver.py", line 23, in __getattr__
[rank0]: self._initialize_obj()
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/driver.py", line 20, in _initialize_obj
[rank0]: self._obj = self._init_fn()
[rank0]: ^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/driver.py", line 9, in _create_driver
[rank0]: return actives[0]()
[rank0]: ^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 371, in __init__
[rank0]: self.utils = CudaUtils() # TODO: make static
[rank0]: ^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 80, in __init__
[rank0]: mod = compile_module_from_src(Path(os.path.join(dirname, "driver.c")).read_text(), "cuda_utils")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/backends/nvidia/driver.py", line 57, in compile_module_from_src
[rank0]: so = _build(name, src_path, tmpdir, library_dirs(), include_dir, libraries)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/runtime/build.py", line 48, in _build
[rank0]: ret = subprocess.check_call(cc_cmd)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/subprocess.py", line 413, in check_call
[rank0]: raise CalledProcessError(retcode, cmd)
[rank0]: subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmp8a6wig5a/main.c', '-O3', '-shared', '-fPIC', '-o', '/tmp/tmp8a6wig5a/cuda_utils.cpython-311-x86_64-linux-gnu.so', '-lcuda', '-L/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/backends/nvidia/lib', '-L/lib/x86_64-linux-gnu', '-I/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/triton/backends/nvidia/include', '-I/tmp/tmp8a6wig5a', '-I/root/anaconda3/envs/qwen2_5/include/python3.11']' returned non-zero exit status 1.
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/ljm/Qwen2.5/qwen2_5_1.5B_vLLM_api_server.py", line 244, in <module>
[rank0]: engine = AsyncLLMEngine.from_engine_args(engine_args)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 674, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 569, in __init__
[rank0]: self.engine = self._engine_class(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 265, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 349, in __init__
[rank0]: self._initialize_kv_caches()
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 484, in _initialize_kv_caches
[rank0]: self.model_executor.determine_num_available_blocks())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/executor/gpu_executor.py", line 114, in determine_num_available_blocks
[rank0]: return self.driver_worker.determine_num_available_blocks()
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/worker/worker.py", line 223, in determine_num_available_blocks
[rank0]: self.model_runner.profile_run()
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1309, in profile_run
[rank0]: self.execute_model(model_input, kv_caches, intermediate_tensors)
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/anaconda3/envs/qwen2_5/lib/python3.11/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper
[rank0]: raise type(err)(
[rank0]: ^^^^^^^^^^
[rank0]: TypeError: CalledProcessError.__init__() missing 1 required positional argument: 'cmd'
但如果去掉LoRA相关的代码,可以正常运行和推理。
请教一下如果要用LoRA微调后的模型,要怎么用?是我用法有错吗?