RuntimeError when training and evaluating on D2SA
Noiredd opened this issue · 3 comments
Thanks for the interesting paper and easy to run implementation!
I tried following your training guide but got surprisingly stuck on the first step (D2SA training):
python tools/train_net.py --config-file configs/D2SA-AmodalSegmentation/mask_rcnn_R_50_FPN_1x_parallel_CtRef_VAR_SPRef_SPRet_FM.yaml
I modified the config to make a quick test, only changing the SOLVER
entry in the Base-RCNN-FPN-D2SA.yaml
as follows:
SOLVER:
IMS_PER_BATCH: 1
BASE_LR: 0.005
MAX_ITER: 7000
OPT_TYPE: "SGD"
(Single image per batch, less iterations, no stepping and no checkpoints.) I figured this kind of change would let me run the model quickly, see if everything works, and then I'd run the whole thing overnight.
After successfully training for 7000 iterations the process crashed at evaluation with the following traceback:
[08/18 15:54:49 d2.evaluation.evaluator]: Start embedding inference on training 1562 images
[08/18 15:54:51 d2.engine.hooks]: Overall training speed: 6997 iterations in 0:38:35 (0.3309 s / it)
[08/18 15:54:51 d2.engine.hooks]: Total training time: 0:43:57 (0:05:22 on hooks)
Traceback (most recent call last):
File "../../repo/tools/train_net.py", line 173, in <module>
args=(args,),
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/launch.py", line 51, in launch
main_func(*args)
File "../../repo/tools/train_net.py", line 161, in main
return trainer.train()
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/defaults.py", line 416, in train
super().train(self.start_iter, self.max_iter)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/train_loop.py", line 133, in train
self.after_step()
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/train_loop.py", line 151, in after_step
h.after_step()
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/hooks.py", line 325, in after_step
results = self._func()
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/defaults.py", line 366, in test_and_save_results
self._last_eval_results = self.test(self.cfg, self.model)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/engine/defaults.py", line 528, in test
model = embedding_inference_on_train_dataset(model, embedding_dataloader)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/evaluation/evaluator.py", line 259, in embedding_inference_on_train_dataset
model(inputs)
File "/home/pdolata/Programy/detectron_amodal/davenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/meta_arch/rcnn.py", line 109, in forward
return self.inference(batched_inputs, do_postprocess=do_postprocess)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/meta_arch/rcnn.py", line 166, in inference
results, _ = self.roi_heads(images, features, proposals, gt_instances)
File "/home/pdolata/Programy/detectron_amodal/davenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/roi_heads/roi_heads.py", line 862, in forward
pred_instances = self.forward_with_given_boxes(features, pred_instances)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/roi_heads/roi_heads.py", line 891, in forward_with_given_boxes
instances = self._forward_mask(features, instances)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/roi_heads/roi_heads.py", line 1008, in _forward_mask
mask_logits, _ = self.mask_head(mask_features, instances)
File "/home/pdolata/Programy/detectron_amodal/davenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/roi_heads/mask_head.py", line 817, in forward
k=self.SPk).detach()
File "/home/pdolata/Programy/detectron_amodal/repo/detectron2/modeling/roi_heads/recon_net.py", line 323, in nearest_decode
indices = torch.topk(- distances, k)[1]
RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23
I know this error pops up when one tries to run torch.topk
with k
greater than the number of classes, but honestly I have no idea where to look for the source of this in the model configs. Any hints?
"indices = torch.topk(- distances, k)[1]" aims to find the top k nearest category-specific features in the codebook. It seems that the dimension of the codebook is not correct?
I suggest checking the dimension of the "distances" and "codebook".
Just found this ancient issue which by the way has nevered occurred again after I reinstalled your repo. Must've been something with dependencies.
"indices = torch.topk(- distances, k)[1]" aims to find the top k nearest category-specific features in the codebook. It seems that the dimension of the codebook is not correct? I suggest checking the dimension of the "distances" and "codebook".
Could you tell me how to use my own dataset to generate codebook?