Error while training
vardeep-sandhu opened this issue · 2 comments
vardeep-sandhu commented
Hey,
I am training the model from scratch on my __ with 12G of memory. I have decreased the batch size, size of attention SIZE parameters ( as suggested by the author) to bare minimum but still keep facing this error.
File "train.py", line 211, in <module>
main()
File "train.py", line 182, in main
merge_all_iters_to_one_epoch=args.merge_all_iters_to_one_epoch
File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 99, in train_model
dataloader_iter=dataloader_iter
File "/automount_home_students/vsandhu/master_project_2/VOTR/tools/train_utils/train_utils.py", line 19, in train_one_epoch
batch = next(dataloader_iter)
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
data = self._next_data()
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/_utils.py", line 425, in reraise
raise self.exc_type(msg)
AssertionError: Caught AssertionError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/kitti/kitti_dataset.py", line 433, in __getitem__
data_dict = self.prepare_data(data_dict=input_dict)
File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/dataset.py", line 142, in prepare_data
data_dict=data_dict
File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 127, in forward
data_dict = cur_processor(data_dict=data_dict)
File "/automount_home_students/vsandhu/master_project_2/VOTR/pcdet/datasets/processor/data_processor.py", line 62, in transform_points_to_voxels
voxel_output = voxel_generator.generate(points)
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 173, in generate
or self._max_voxels, self._full_mean)
File "/automount_home_students/vsandhu/anaconda3/envs/voxtr/lib/python3.6/site-packages/spconv/utils/__init__.py", line 69, in points_to_voxel
assert block_filtering is False
AssertionError
Thanks in advance for the help
PointsCoder commented
@here-to-learn0 I didn't encounter this problem. I guess it may be related to the incompatible spconv version. You may refer to this issue for more information.
Btw, this code was tested with spconv v1.2.
vardeep-sandhu commented
@PointsCoder I followed the issue that you referred to, and the model seems to be training now but that version of spconv
is v1.0 instead of 1.2 which I had earlier.
But thanks for pointing it out