Error while training

Question

Error while training

vardeep-sandhu opened this issue 3 years ago · 2 comments

Hey,

I am training the model from scratch on my __ with 12G of memory. I have decreased the batch size, size of attention SIZE parameters ( as suggested by the author) to bare minimum but still keep facing this error.

 File "train.py", line 211, in <module>
   main()
 File "train.py", line 182, in main
   merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
 File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 99, in train_model
   dataloader_iter=dataloader_iter
 File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 19, in train_one_epoch
   batch = next(dataloader_iter)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
   data = self._next_data()
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
   return self._process_data(data)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
   data.reraise()
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
   raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
   data = fetcher.fetch(index)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
   data = [self.dataset[idx] for idx in possibly_batched_index]
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
   data = [self.dataset[idx] for idx in possibly_batched_index]
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 433, in __getitem__
   data_dict = self.prepare_data(data_dict=input_dict)
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/dataset.py", line 142, in prepare_data
   data_dict=data_dict
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 127, in forward
   data_dict = cur_processor(data_dict=data_dict)
 File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 62, in transform_points_to_voxels
   voxel_output = voxel_generator.generate(points)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 173, in generate
   or self._max_voxels, self._full_mean)
 File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 69, in points_to_voxel
   assert block_filtering is False
AssertionError

Thanks in advance for the help

Answer 1 · 2021-10-13T12:50:59.000Z

@here-to-learn0 I didn't encounter this problem. I guess it may be related to the incompatible spconv version. You may refer to this issue for more information.

Btw, this code was tested with spconv v1.2.

Answer 2 · 2021-10-15T12:33:26.000Z

@PointsCoder I followed the issue that you referred to, and the model seems to be training now but that version of spconv is v1.0 instead of 1.2 which I had earlier.

But thanks for pointing it out