train error

Question

train error

Closed this issue 5 months ago · 2 comments

当我进行第一步 1.1 Train the geometry field. 出现以下报错
IndexError: index 715640528 is out of bounds for dimension 0 with size 631917504
详细信息


stop_semantic_grad:True
ignore_index: 0
Namespace(gpnerf=True, debug=False, val_type='val', logger_interval=10, separate_semantic=True, freeze_geo=False, dataset_type='memory_depth_dji', balance_weight=True, remove_cluster=True, use_subset=False, label_name_3d_to_2d='label_pc', start=-1, end=-1, check_depth=False, contract_new=True, use_plane=True, geo_init_method='idr', save_individual=False, continue_train=False, depth_dji_loss=False, depth_dji_type='mesh', sampling_mesh_guidance=False, wgt_air_sigma_loss=0, around_mesh_meter=5, wgt_depth_mse_loss=0.0, wgt_sigma_loss=0, sample_ray_num=1024, visual_normal=True, normal_loss=False, wgt_nl1_loss=0.0001, wgt_ncos_loss=0.0001, depth_loss=False, wgt_depth_loss=0.0, auto_grad=False, decay_min=0.1, save_depth=False, fushi=False, enable_instance=False, num_instance_classes=50, wgt_instance_loss=1, freeze_semantic=False, instance_name='instances_mask_0.001', instance_loss_mode='linear_assignment', cached_centroids_path=None, use_dbscan=True, wgt_concentration_loss=1, crossview_process_path='zyq/test', crossview_all=False, stop_semantic_grad=True, ignore_index=0, label_name='fusion', enable_semantic=False, num_semantic_classes=5, num_layers_semantic_hidden=3, semantic_layer_dim=128, wgt_sem_loss=1, network_type='gpnerf_nr3d', clip_grad_max=0, num_layers=2, num_layers_color=3, layer_dim=64, appearance_dim=48, geo_feat_dim=15, num_levels=16, base_resolution=16, desired_resolution=8192, log2_hashmap_size=22, hash_feat_dim=2, writer_log=True, wandb_id='None', wandb_run_name='test', use_scaling=False, contract_norm='l2', contract_bg_len=1, aabb_bound=1.6, train_iterations=200000, val_interval=50000, ckpt_interval=50000, model_chunk_size=10485760, ray_chunk_size=20480, batch_size=10240, coarse_samples=128, fine_samples=128, ckpt_path=None, config_file='configs/yingrenshi.yaml', chunk_paths=None, desired_chunks=20, num_chunks=20, disk_flush_size=10000000, train_every=1, cluster_mask_path=None, container_path=None, bg_layer_dim=256, near=1, far=None, ray_altitude_range=[-95.0, 54.0], train_scale_factor=4, val_scale_factor=4, pos_xyz_dim=10, pos_dir_dim=4, layers=8, skip_layers=[4], affine_appearance=False, use_cascade=False, train_mega_nerf=None, boundary_margin=1.15, all_val=False, cluster_2d=False, center_pixels=True, shifted_softplus=True, image_pixel_batch_size=8192, perturb=1.0, noise_std=1.0, lr=0.001, lr_decay_factor=1, bg_nerf=False, ellipse_scale_factor=1.1, ellipse_bounds=True, resume_ckpt_state=True, amp=True, detect_anomalies=False, random_seed=42, render_zyq=False, render_zyq_far_view='render_far0.3', exp_name='logs/yingrenshi_geo', dataset_path='Yingrenshi')
Origin: tensor([-9.4238e+01, -1.2068e+06, -2.3388e+06]), scale factor: 334.7266229371708
Ray bounds: 0.0029875125893039675, 100000.0
Ray altitude range in [-1, 1] space: [tensor(-0.0023, dtype=torch.float64), tensor(0.4429, dtype=torch.float64)]
Ray altitude range in metric space: [-95.0, 54.0]
Using 854 train images and 15 val images
Camera range in metric space: tensor([-1.3072e+02, -1.2071e+06, -2.3390e+06]) tensor([-5.7758e+01, -1.2065e+06, -2.3385e+06])
Camera range in [-1, 1] space: tensor([-0.1090, -0.9933, -0.6832]) tensor([0.1090, 0.9933, 0.6832])
Camera range in [-1, 1] space with ray altitude range: tensor([-0.1090, -0.9933, -0.6832]) tensor([0.4429, 0.9933, 0.6832])
Sphere center: tensor([0.1669, 0.0000, 0.0000], device='cuda:0'), radius: tensor([0.4785, 1.7223, 1.1847], device='cuda:0')
2024-04-16 15:57:46,025-rk0-utils.py#20:kaolin is not installed. OctreeAS / ForestAS disabled.
2024-04-16 15:57:46,025-rk0-lotd_encoding.py#35:tensorly is not installed.
the dataset_type is :memory_depth_dji
layer_dim: 64
semantic layer_dim: 128
use two mlp
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#129:NGP auto-computed config: layer resolutions: [[24, 190, 130], [34, 263, 180], [46, 363, 250], [64, 502, 345], [89, 694, 477], [124, 959, 660], [171, 1326, 912], [236, 1833, 1260], [327, 2533, 1742], [452, 3501, 2408], [625, 4838, 3328], [864, 6686, 4599], [1194, 9241, 6356], [1650, 12771, 8784], [2280, 17649, 12140], [3152, 24391, 16778], [4356, 33709, 23187]]
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#130:NGP auto-computed config: layer types: ['Dense', 'Dense', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash']
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#131:NGP auto-computed config: layer n_feats: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#132:NGP auto-computed config: expected num_params=134217728; generated: 130233840 [0.97x]
Hash and Plane
Hash and Plane
the parameters of whole model:   total: 151771568, fg: 151771568, bg: 0
no using wandb
Loading data
  0%|                                                                                                                                                                                           | 0/854 [00:00<?, ?it/s]/data/Aerial_lifting/./gp_nerf/image_metadata.py:42: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
  return torch.ByteTensor(np.asarray(rgbs))
/data/Aerial_lifting/./gp_nerf/datasets/memory_dataset_depth_dji.py:89: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  labels.append(torch.tensor(label, dtype=torch.int))
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 854/854 [02:26<00:00,  5.83it/s]
load_subset: 0
Finished loading data
  0%|                                                                                                                                                                                        | 0/200000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/Aerial_lifting/gp_nerf/train.py", line 67, in <module>
    main(_get_train_opts())
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/data/Aerial_lifting/gp_nerf/train.py", line 63, in main
    Runner(hparams).train()
  File "/data/Aerial_lifting/./gp_nerf/runner_gpnerf.py", line 491, in train
    for dataset_index, item in enumerate(data_loader):
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/data/Aerial_lifting/./gp_nerf/datasets/memory_dataset_depth_dji.py", line 122, in __getitem__
    item['labels'] = self._labels[idx].int()
IndexError: index 715640528 is out of bounds for dimension 0 with size 631917504

Answer 1 · 2024-04-16T08:58:09.000Z

Thanks for pointing out! Fix it now.

Answer 2 · 2024-04-16T09:16:06.000Z

thanks a lot