Training on A6000 with CUDA 11.7
DarylWM opened this issue · 0 comments
DarylWM commented
Hello. Thank you for sharing this code; I think it's really interesting work and I'm keen to try it for my dataset. To get started, I'd like to train the example on an A6000 using CUDA 11.7. I tried the specified versions in the environment.yml but need to bump up the CUDA version to get A6000 support. I get this error:
experiments/experiment_1/
backing up... done.
train.py:1301: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
return imageio.imread(f, ignoregamma=True) if f[-4:] == ".png" else imageio.imread(f)
Loaded llff (86, 384, 512, 3) (120, 3, 5) [384. 512. 256.60952759] data/example_sequence/
DEFINING BOUNDS
NEAR FAR 0.0021997066447511314 1.0024441480636597
Found ckpts []
start: 0 args.N_iters: 200000
C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3191.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
get rays
done, concats
(86, 384, 512, 4, 3)
TRAIN views are [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80 81 82 83 84 85]
TEST views are []
VAL views are []
Begin
0%| | 0/200000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 2016, in <module>
main_function(args)
File "train.py", line 1566, in main_function
losses = parallel_training(
File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\parallel\data_parallel.py", line 169, in forward
return self.module(*inputs[0], **kwargs[0])
File "C:\Users\dwil6816\Anaconda3\envs\nrnerf\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "train.py", line 188, in forward
imageid_to_timestepid[batch_pixel_indices[:, 0]], :
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
Here are the Torch versions I'm using:
torch 1.13.1+cu117
torchaudio 0.13.1+cu117
torchvision 0.14.1+cu117
I'm new to Pytorch so I'm wondering whether there's a global fix or do I need to go through and check where the tensors are allocated?