RuntimeError: CUDA error: no kernel image is available for execution on the device
Closed this issue · 1 comments
mrpeerat commented
Hi,
I ran an example code:
import os
import deepspeed
import torch
from transformers import pipeline
local_rank = int(os.getenv('LOCAL_RANK', '0'))
world_size = int(os.getenv('WORLD_SIZE', '1'))
generator = pipeline('text-generation', model='EleutherAI/gpt-neo-2.7B',
device=local_rank)
generator.model = deepspeed.init_inference(generator.model,
tensor_parallel={"tp_size": world_size},
dtype=torch.float,
replace_with_kernel_inject=True)
string = generator("DeepSpeed is", do_sample=True, min_length=50)
if not torch.distributed.is_initialized() or torch.distributed.get_rank() == 0:
print(string)
And found this error:
Traceback (most recent call last):
File "/home/aisg/peerat/imp/test.py", line 13, in <module>
generator.model = deepspeed.init_inference(generator.model,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/__init__.py", line 364, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 156, in __init__
self._apply_injection_policy(config)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 413, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 393, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 642, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 702, in _replace_module
_, layer_id = _replace_module(child,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 702, in _replace_module
_, layer_id = _replace_module(child,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 678, in _replace_module
replaced_module = policies[child.__class__][0](child,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 321, in replace_fn
new_module = replace_with_policy(child,
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 234, in replace_with_policy
_container.initialize_tensors()
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/containers/features/meta_tensor.py", line 26, in initialize_tensors
super().initialize_tensors(enable_training=enable_training)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/containers/features/hybrid_engine.py", line 30, in initialize_tensors
super().initialize_tensors(enable_training=enable_training)
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/containers/base.py", line 142, in initialize_tensors
self.set_attention(*self.policy.attention(enable_training=enable_training))
File "/shared/miniconda3/envs/peerat_mllm/lib/python3.10/site-packages/deepspeed/module_inject/containers/gptneo.py", line 128, in attention
qkvw = Parameter(torch.cat((qw, kw, vw), dim=0), requires_grad=enable_training)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Here is my pip list
torch 2.0.1
deepspeed 0.15.3
transformers 4.38.0
CUDA 12.2
Python 3.10.15
GPU 8 of A100 (80 GB)
I tried re-installing deepspeed with DS_BUILD_FISED_ADAM=1 pip install deepspeed
, but I still get the same error.
Any suggestion?
Thank you.