LORA: RuntimeError: GET was unable to find an engine to execute this computation
LamOne1 opened this issue · 1 comments
LamOne1 commented
Hello,
I ran adapter_v2 code without any issue, but I couldn't run lora code with the same environment and without changing anything in the code.
[rank: 0] Global seed set to 1337
Traceback (most recent call last):
File ".../finetune/lora.py", line 218, in <module>
CLI(main)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/jsonargparse/cli.py", line 85, in CLI
return _run_component(component, cfg_init)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/jsonargparse/cli.py", line 147, in _run_component
return component(**cfg)
File ".../finetune/lora.py", line 79, in main
train(fabric, model, optimizer, train_data, val_data, out_dir)
File ".../finetune/lora.py", line 112, in train
logits = model(input_ids)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/lightning-2.1.0.dev0-py3.9.egg/lightning/fabric/wrappers.py", line 116, in forward
output = self._forward_module(*args, **kwargs)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../lit_llama/model.py", line 105, in forward
x, _ = block(x, rope, mask, max_seq_length)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../lit_llama/model.py", line 164, in forward
h, new_kv_cache = self.attn(self.rms_1(x), rope, mask, max_seq_length, input_pos, kv_cache)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../lit_llama/model.py", line 196, in forward
q, k, v = self.c_attn(x).split(self.n_embd, dim=2)
File ".../.conda/envs/llm2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File ".../lit_llama/lora.py", line 318, in forward
after_B = F.conv1d(
RuntimeError: GET was unable to find an engine to execute this computation
LamOne1 commented
The issue is fixed after changing the GPU from A100 to V100