CUDA error: no kernel image is available for execution on the device

Question

CUDA error: no kernel image is available for execution on the device

mateuszwosinski opened this issue 4 years ago · 1 comments

I get the following error while trying to train a model with inplace-abn:

~/.conda/envs/py37/lib/python3.7/site-packages/inplace_abn/functions.py in inplace_abn(x, weight, bias, running_mean, running_var, training, momentum, eps, activation, activation_param)
152 training=True, momentum=0.1, eps=1e-05, activation="leaky_relu", activation_param=0.01):
153 return InPlaceABN.apply(x, weight, bias, running_mean, running_var,
--> 154 training, momentum, eps, activation, activation_param, None)
155
156

~/.conda/envs/py37/lib/python3.7/site-packages/inplace_abn/functions.py in forward(ctx, x, weight, bias, running_mean, running_var, training, momentum, eps, activation, activation_param, group)
83
84 # Update running stats
---> 85 count_ = count.to(dtype=var.dtype)
86 running_mean.mul_((1 - ctx.momentum)).add_(ctx.momentum * mean)
87 running_var.mul_((1 - ctx.momentum)).add_(ctx.momentum * var * count_ / (count_ - 1))

RuntimeError: CUDA error: no kernel image is available for execution on the device

Those are my settings:
GPU Device: Tesla V100-SXM2-16GB
GPU mounted at: cuda:0
PyTorch Version: 1.4.0
Torchvision Version: 0.5.0
CUDA version: 10.0

Any suggestions how to solve it? I have no problems with training models without inplace-abn.

Answer 1 · 2020-09-30T15:46:25.000Z

@mateuszwosinski The error you are encountering means that InPlace ABN was compiled for a different GPU architecture than the one you are using. Are you installing InPlace ABN from a different machine than the one you are using for training, by any chance?

In any case, a possible solution for your issue should be to re-install after appropriately setting the TORCH_CUDA_ARCH_LIST environment variable, e.g. by running: TORCH_CUDA_ARCH_LIST="7.0" pip install inplace-abn.