train error
Closed this issue · 2 comments
sihuanian-2 commented
当我进行第一步 1.1 Train the geometry field. 出现以下报错
IndexError: index 715640528 is out of bounds for dimension 0 with size 631917504
详细信息
stop_semantic_grad:True
ignore_index: 0
Namespace(gpnerf=True, debug=False, val_type='val', logger_interval=10, separate_semantic=True, freeze_geo=False, dataset_type='memory_depth_dji', balance_weight=True, remove_cluster=True, use_subset=False, label_name_3d_to_2d='label_pc', start=-1, end=-1, check_depth=False, contract_new=True, use_plane=True, geo_init_method='idr', save_individual=False, continue_train=False, depth_dji_loss=False, depth_dji_type='mesh', sampling_mesh_guidance=False, wgt_air_sigma_loss=0, around_mesh_meter=5, wgt_depth_mse_loss=0.0, wgt_sigma_loss=0, sample_ray_num=1024, visual_normal=True, normal_loss=False, wgt_nl1_loss=0.0001, wgt_ncos_loss=0.0001, depth_loss=False, wgt_depth_loss=0.0, auto_grad=False, decay_min=0.1, save_depth=False, fushi=False, enable_instance=False, num_instance_classes=50, wgt_instance_loss=1, freeze_semantic=False, instance_name='instances_mask_0.001', instance_loss_mode='linear_assignment', cached_centroids_path=None, use_dbscan=True, wgt_concentration_loss=1, crossview_process_path='zyq/test', crossview_all=False, stop_semantic_grad=True, ignore_index=0, label_name='fusion', enable_semantic=False, num_semantic_classes=5, num_layers_semantic_hidden=3, semantic_layer_dim=128, wgt_sem_loss=1, network_type='gpnerf_nr3d', clip_grad_max=0, num_layers=2, num_layers_color=3, layer_dim=64, appearance_dim=48, geo_feat_dim=15, num_levels=16, base_resolution=16, desired_resolution=8192, log2_hashmap_size=22, hash_feat_dim=2, writer_log=True, wandb_id='None', wandb_run_name='test', use_scaling=False, contract_norm='l2', contract_bg_len=1, aabb_bound=1.6, train_iterations=200000, val_interval=50000, ckpt_interval=50000, model_chunk_size=10485760, ray_chunk_size=20480, batch_size=10240, coarse_samples=128, fine_samples=128, ckpt_path=None, config_file='configs/yingrenshi.yaml', chunk_paths=None, desired_chunks=20, num_chunks=20, disk_flush_size=10000000, train_every=1, cluster_mask_path=None, container_path=None, bg_layer_dim=256, near=1, far=None, ray_altitude_range=[-95.0, 54.0], train_scale_factor=4, val_scale_factor=4, pos_xyz_dim=10, pos_dir_dim=4, layers=8, skip_layers=[4], affine_appearance=False, use_cascade=False, train_mega_nerf=None, boundary_margin=1.15, all_val=False, cluster_2d=False, center_pixels=True, shifted_softplus=True, image_pixel_batch_size=8192, perturb=1.0, noise_std=1.0, lr=0.001, lr_decay_factor=1, bg_nerf=False, ellipse_scale_factor=1.1, ellipse_bounds=True, resume_ckpt_state=True, amp=True, detect_anomalies=False, random_seed=42, render_zyq=False, render_zyq_far_view='render_far0.3', exp_name='logs/yingrenshi_geo', dataset_path='Yingrenshi')
Origin: tensor([-9.4238e+01, -1.2068e+06, -2.3388e+06]), scale factor: 334.7266229371708
Ray bounds: 0.0029875125893039675, 100000.0
Ray altitude range in [-1, 1] space: [tensor(-0.0023, dtype=torch.float64), tensor(0.4429, dtype=torch.float64)]
Ray altitude range in metric space: [-95.0, 54.0]
Using 854 train images and 15 val images
Camera range in metric space: tensor([-1.3072e+02, -1.2071e+06, -2.3390e+06]) tensor([-5.7758e+01, -1.2065e+06, -2.3385e+06])
Camera range in [-1, 1] space: tensor([-0.1090, -0.9933, -0.6832]) tensor([0.1090, 0.9933, 0.6832])
Camera range in [-1, 1] space with ray altitude range: tensor([-0.1090, -0.9933, -0.6832]) tensor([0.4429, 0.9933, 0.6832])
Sphere center: tensor([0.1669, 0.0000, 0.0000], device='cuda:0'), radius: tensor([0.4785, 1.7223, 1.1847], device='cuda:0')
2024-04-16 15:57:46,025-rk0-utils.py#20:kaolin is not installed. OctreeAS / ForestAS disabled.
2024-04-16 15:57:46,025-rk0-lotd_encoding.py#35:tensorly is not installed.
the dataset_type is :memory_depth_dji
layer_dim: 64
semantic layer_dim: 128
use two mlp
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#129:NGP auto-computed config: layer resolutions: [[24, 190, 130], [34, 263, 180], [46, 363, 250], [64, 502, 345], [89, 694, 477], [124, 959, 660], [171, 1326, 912], [236, 1833, 1260], [327, 2533, 1742], [452, 3501, 2408], [625, 4838, 3328], [864, 6686, 4599], [1194, 9241, 6356], [1650, 12771, 8784], [2280, 17649, 12140], [3152, 24391, 16778], [4356, 33709, 23187]]
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#130:NGP auto-computed config: layer types: ['Dense', 'Dense', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash', 'Hash']
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#131:NGP auto-computed config: layer n_feats: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
2024-04-16 15:57:46,031-rk0-lotd_cfg.py#132:NGP auto-computed config: expected num_params=134217728; generated: 130233840 [0.97x]
Hash and Plane
Hash and Plane
the parameters of whole model: total: 151771568, fg: 151771568, bg: 0
no using wandb
Loading data
0%| | 0/854 [00:00<?, ?it/s]/data/Aerial_lifting/./gp_nerf/image_metadata.py:42: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:206.)
return torch.ByteTensor(np.asarray(rgbs))
/data/Aerial_lifting/./gp_nerf/datasets/memory_dataset_depth_dji.py:89: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
labels.append(torch.tensor(label, dtype=torch.int))
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 854/854 [02:26<00:00, 5.83it/s]
load_subset: 0
Finished loading data
0%| | 0/200000 [00:00<?, ?it/s]Traceback (most recent call last):
File "/data/Aerial_lifting/gp_nerf/train.py", line 67, in <module>
main(_get_train_opts())
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/data/Aerial_lifting/gp_nerf/train.py", line 63, in main
Runner(hparams).train()
File "/data/Aerial_lifting/./gp_nerf/runner_gpnerf.py", line 491, in train
for dataset_index, item in enumerate(data_loader):
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
data = self._next_data()
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/kpn/.conda/envs/aerial/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/data/Aerial_lifting/./gp_nerf/datasets/memory_dataset_depth_dji.py", line 122, in __getitem__
item['labels'] = self._labels[idx].int()
IndexError: index 715640528 is out of bounds for dimension 0 with size 631917504
zyqz97 commented
Thanks for pointing out! Fix it now.
sihuanian-2 commented
thanks a lot