hanoonaR/object-centric-ovd

segmentation fault when training COCO_OVD_base_PIS.yaml

Closed this issue · 0 comments

Hello,

I caught the following exception (edit: This is to train model on COCO. I followed all steps in dataset preparation):

Exception has occurred: IndexError
Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 95, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 218, in __getitem__
    return self._lst[idx]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 151, in __getitem__
    start_addr = 0 if idx == 0 else self._addr[idx - 1].item()
IndexError: index 1412515039 is out of bounds for dimension 0 with size 205996
  File "/home/SERILOCAL/hai.xuanpham/object-centric-ovd/ovd/transforms/custom_dataset_dataloader.py", line 287, in __iter__
    for d in self.dataset:
  File "/home/SERILOCAL/hai.xuanpham/object-centric-ovd/train_net.py", line 135, in do_train
    for data, iteration in zip(data_loader, range(start_iter, max_iter)):
  File "/home/SERILOCAL/hai.xuanpham/object-centric-ovd/train_net.py", line 221, in main
    do_train(cfg, model, resume=args.resume)
  File "/home/SERILOCAL/hai.xuanpham/object-centric-ovd/train_net.py", line 230, in <module>
    launch(
IndexError: Caught IndexError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 95, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 218, in __getitem__
    return self._lst[idx]
  File "/home/SERILOCAL/hai.xuanpham/anaconda3/lib/python3.9/site-packages/detectron2/data/common.py", line 151, in __getitem__
    start_addr = 0 if idx == 0 else self._addr[idx - 1].item()
IndexError: index 1412515039 is out of bounds for dimension 0 with size 205996

Something is wrong with the data, but I haven't figured out what went wrong where. Has anyone run into the same problem?

EDIT2: I reduced NUM_WORKERS from 8 to 4 2, and the problems went away. Not sure how really.