yzxing87/Invertible-ISP

RuntimeError: CUDA error: an illegal memory access was encountered

ggao33 opened this issue · 1 comments

Hi,
I am currently facing this issue below, when running train.py. Could you plz give me a hand?
My pc env is under:

  • Ubuntu 18.04
  • NVIDIA-SMI 455.45.01
  • Driver Version: 455.45.01
  • CUDA Version: 11.1
  • python 3.8
  • torch 1.8.0

/home/anaconda3/bin/python /home/Documents/Invertible-ISP-main/train_cuda.py --task=debug --data_path=./data/ --gamma --aug --camera=NIKON_D700 --out_path=./exps/ --debug_mode
Parsed arguments: Namespace(aug=True, batch_size=1, camera='NIKON_D700', data_path='./data/', debug_mode=True, gamma=True, loss='L1', lr=0.0001, out_path='./exps/', resume=False, rgb_weight=1, task='debug')
[INFO] Start data loading and preprocessing
[INFO] Start to train
Traceback (most recent call last):
File "/home/Documents/Invertible-ISP-main/train_cuda.py", line 99, in
main(args)
File "/home/Documents/Invertible-ISP-main/train_cuda.py", line 72, in main
reconstruct_raw = net(reconstruct_rgb, rev=True)
File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/Documents/Invertible-ISP-main/model/model.py", line 176, in forward
out = op.forward(out, rev)
File "/home/Documents/Invertible-ISP-main/model/model.py", line 124, in forward
self.s = self.clamp * (torch.sigmoid(self.H(x1)) * 2 - 1)
RuntimeError: CUDA error: an illegal memory access was encountered

Process finished with exit code 1

If switched to invertible-isp as your environment.yml said, the code somehow ghost stopped at
line 22: DiffJPEG = DiffJPEG(differentiable=True, quality=90).cuda()
without showing any errors nor printing "start to train"

Hi, thanks for your interest.

Our provided enviornment.yml should work fine with CUDA 10.1. If you are using CUDA 11.1, please install pytorch 1.7.1. Please also note only some latest PyTorch versions (e.g. >1.7.0) works on CUDA 11 machines. Otherwise the program may get stuck.