ashawkey/RAD-NeRF

Killed when I want to use --preload 2

zhang010930 opened this issue · 3 comments

I saw in the document that you can use --preload 2 to make the training faster, but when I run it with a server with 3090 and 80G memory, it will report an error. I saw in the document that 24G memory is enough. Why is this?

root@autodl-container-bde111aa08-2431f782:/autodl-tmp/RAD-NeRF/RAD-NeRF-main# python main.py data/nv2zhongwen/ --workspace trial_nv2zhongwen/ -O --iters 200000 --preload 2
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=200000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/nv2zhongwen/', preload=2, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_nv2zhongwen/')
[INFO] load 6821 train frames.
[INFO] load aud_features: torch.Size([7504, 44, 16])
Loading train data: 100%|████████████████████████████████████████████████████████████████████████████| 6821/6821 [01:21<00:00, 84.20it/s]
Killed
root@autodl-container-bde111aa08-2431f782:
/autodl-tmp/RAD-NeRF/RAD-NeRF-main# python main.py data/nv2zhongwen/ --workspace trial_nv2zhongwen/ -O --iters 200000 --preload 1
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=200000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/nv2zhongwen/', preload=1, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_nv2zhongwen/')
[INFO] load 6821 train frames.
[INFO] load aud_features: torch.Size([7504, 44, 16])
Loading train data: 100%|████████████████████████████████████████████████████████████████████████████| 6821/6821 [01:26<00:00, 78.93it/s]
Killed

image

Experiencing the same issue when I use --preload 2.

What happens is when loading the dataset, the RAM usage keeps increasing, eventually gets maxed out and gets killed.

Been stuck on this a few hours now.

After further investigation, I found the following lines of code to be the culprit

RAD-NeRF/nerf/provider.py

Lines 530 to 532 in 0de5ed2

if self.preload > 0:
self.images = torch.from_numpy(np.stack(self.images, axis=0)) # [N, H, W, C]
self.torso_img = torch.from_numpy(np.stack(self.torso_img, axis=0)) # [N, H, W, C]

Due to np.stack allocating memory to the new array as well as the existing memory allocation of the existing images, this was maxing my 80GB RAM out once it get to Line 532.

See more details in the answer to this Stackoverflow question https://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error

Solution was I had to rewrite NerfDataset to use the technique in the answer.

Important Snippet to give an idea, I did this for self.torso_images and self.images

          # TODO: dynamicaly determine shape last dim, it can be 4 or 3
          self.images = np.empty((len(frames), self.H, self.W, 3), dtype=np.float32) # [N, H, W, C]
          index = 0
          for f in tqdm.tqdm(frames, desc=f'Preloading images {type}  data '):
              f_path = os.path.join(self.root_path, 'gt_imgs', str(f['img_id']) + '.jpg')
              image = cv2.imread(f_path, cv2.IMREAD_UNCHANGED) # [H, W, 3] o [H, W, 4]
              image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
              image = image.astype(np.float32) / 255 # [H, W, 3/4]
              self.images[index] = image
              index += 1

          self.images = torch.from_numpy(self.images) # [N, H, W, C]
          if self.preload > 1:
              self.images = self.images.to(torch.half).to(self.device)

Context
I was training on a video ~5mins+ long on a A100 80G, with 12 CPU cores and 80Gb of RAM to test.

PS:
I will try to make time to probably make a PR for this when I am chanced.
cc: @ashawkey

After further investigation, I found the following lines of code to be the culprit

RAD-NeRF/nerf/provider.py

Lines 530 to 532 in 0de5ed2

if self.preload > 0:
self.images = torch.from_numpy(np.stack(self.images, axis=0)) # [N, H, W, C]
self.torso_img = torch.from_numpy(np.stack(self.torso_img, axis=0)) # [N, H, W, C]

Due to allocating memory to the new array as well as the existing memory allocation of the existing images, this was maxing my 80GB RAM out once it get to Line 532.np.stack

See more details in the answer to this Stackoverflow question https://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error

Solution was I had to rewrite to use the technique in the answer.NerfDataset

Important Snippet to give an idea, I did this for and self.torso_images``self.images

          # TODO: dynamicaly determine shape last dim, it can be 4 or 3
          self.images = np.empty((len(frames), self.H, self.W, 3), dtype=np.float32) # [N, H, W, C]
          index = 0
          for f in tqdm.tqdm(frames, desc=f'Preloading images {type}  data '):
              f_path = os.path.join(self.root_path, 'gt_imgs', str(f['img_id']) + '.jpg')
              image = cv2.imread(f_path, cv2.IMREAD_UNCHANGED) # [H, W, 3] o [H, W, 4]
              image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
              image = image.astype(np.float32) / 255 # [H, W, 3/4]
              self.images[index] = image
              index += 1

          self.images = torch.from_numpy(self.images) # [N, H, W, C]
          if self.preload > 1:
              self.images = self.images.to(torch.half).to(self.device)

Context I was training on a video ~5mins+ long on a A100 80G, with 12 CPU cores and 80Gb of RAM to test.

PS: I will try to make time to probably make a PR for this when I am chanced. cc: @ashawkey

May I ask if you could tell me the specific number of lines you have modified? Alternatively, could you please send NerfDataset to my email. My email number is 907551572@qq.com . Your help is very important to me. I hope to receive your help.