"out of memory" when in eval() mode

Question

"out of memory" when in eval() mode

ahmed-shariff opened this issue 7 years ago · 1 comments

I am using the model to test it on some of my own images, I am trying to use the model by importing it as a module. When I set the model to eval mode, I get the following:

THCudaCheck FAIL file=/home/amsha/builds/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "pipeline.py", line 601, in <module>
    main(parser.parse_args())
  File "pipeline.py", line 594, in main
    _main()
  File "pipeline.py", line 141, in _main
    train_output = current_model.train_model(dataloader.get_train_input_fn(), classification_steps)
  File "/home/amsha/Research/FoodClassification/models/models_fasterrcnn.py", line 288, in train_model
    rois_label = self.model(input_var, iinfo_var, gtbox_var, nmbox_var)
  File "/home/amsha/virtualenv/torch-master-13022018/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/amsha/Research/faster-rcnn.pytorch/lib/model/faster_rcnn/faster_rcnn.py", line 77, in forward
    pooled_feat = self.RCNN_roi_crop(base_feat, Variable(grid_yx).detach())
  File "/home/amsha/virtualenv/torch-master-13022018/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/amsha/Research/faster-rcnn.pytorch/lib/model/roi_crop/modules/roi_crop.py", line 8, in forward
    return RoICropFunction()(input1, input2)
  File "/home/amsha/Research/faster-rcnn.pytorch/lib/model/roi_crop/functions/roi_crop.py", line 11, in forward
    output = input2.new(input2.size()[0], input1.size()[1], input2.size()[1], input2.size()[2]).zero_()
RuntimeError: cuda runtime error (2) : out of memory at /home/amsha/builds/pytorch/aten/src/THC/generic/THCStorage.cu:58

The code block where this originates:

        self.model = resnet([0,1], 50)
        self.model.create_architecture()
        ...
        # When I use model.train(False) or model.eval(), I get the the above error. 
        # The problem doesn't happen when the following is model.train()
        self.model.train(False)
        for root_dir, dirs, files in os.walk("../Datasets/captured/processed_v2/"):
            for f in files:
                img = io.imread(os.path.join(root_dir, f))
                i = [torchvision.transforms.ToTensor()(img).unsqueeze(0), None ,torch.Tensor(0), torch.Tensor(img.shape[:2]).unsqueeze(0), torch.Tensor([[[1,2,3,4,1]]])]
                if self.use_cuda:
                    i = [i_.cuda() if i_ is not None else  i_ for i_ in i ]
                
                input_var = torch.autograd.Variable(i[0].float())
                nmbox_var = torch.autograd.Variable(i[2])
                iinfo_var = torch.autograd.Variable(i[3])
                gtbox_var = torch.autograd.Variable(i[4])
                rois, cls_prob, bbox_pred, \
                  rpn_loss_cls, rpn_loss_bbox, \
                  RCNN_loss_cls, RCNN_loss_bbox, \
                  rois_label = self.model(input_var, iinfo_var, gtbox_var, nmbox_var)

                print(rpn_loss_cls, rpn_loss_bbox, \
                  RCNN_loss_cls, RCNN_loss_bbox,)
                continue

I am using the resnet50 in the model.
pytorch version: '0.4.0a0+b608ea9'
faster-rcnn_pytorch from: 28ee76d6ae868ca43c4e38bedbafd82d919f601a
GPU: GTX 1050

Answer 1 · 2018-04-18T07:27:10.000Z

This is embarrassing! This is the wrong repo. So sorry mate!