Harry-Zhi/semantic_nerf

RuntimeError: CUDA out of memory.

Jiazxzx opened this issue · 2 comments

I want to run on my win10 computer, and the GPU is RTX2060super8G. Maybe GPU is not good enough :) , and I just want to try to run through it.

Traceback (most recent call last):
File "train_SSR_main.py", line 225, in
train()
File "train_SSR_main.py", line 213, in train
ssr_trainer.step(global_step)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 981, in step
rgbs, disps, deps, vis_deps, sems, vis_sems, sem_uncers, vis_sem_uncers = self.render_path(self.rays_vis, save_dir=trainsavedir)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 1156, in render_path
output_dict = self.render_rays(rays[i])
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 703, in render_rays
all_ret = batchify_rays(fn, flat_rays, self.chunk)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\training_utils.py", line 10, in batchify_rays
ret = render_fn(rays_flat[i:i + chunk])
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 745, in volumetric_rendering
raw_coarse = run_network(pts_coarse_sampled, viewdirs, self.ssr_net_coarse,
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\models\model_utils.py", line 31, in run_network
embedded = torch.cat([embedded, embedded_dirs], -1)
RuntimeError: CUDA out of memory. Tried to allocate 1.65 GiB (GPU 0; 8.00 GiB total capacity; 5.00 GiB already allocated; 910.21 MiB free; 5.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After GOOGLE, I know I need to reduce the batch_size or chunk size, but I don't know where PROJECT CODE I need to edit and what reduced value of batch_size and chunk size is proper.
And batch_size or chunk size, which one should I choose to reduce first?

thanks !!!!!!

Edit: I'm a beginner in NeRF. Apologize if it's a dummy question:)

Hi.

Sorry for the late reply.

I think you can modify most of the hyper-parameters in the corresponding config files (*.yaml).

To reduce the memory consumption, you could either reduce network size, number of rays per training step (i.e., N_rays), number of samples per ray during volume rendering (N_samples and N_importance for coarse and fine sampling, respectively).

Hope this helps!

Thanks for your reply !!!

I have solved it by using a better GPU 3090, but your reply still helps me a lot. Perhaps next time a larger model can't run on the current hardware and I will try the method you provide.

Thanks again !