garrickbrazil/M3D-RPN

CUDA out of memory error

chinmaydharmatti opened this issue · 1 comments

Hi Garrick,

I am trying to replicate the results. When I execute the warmup script, I get a CUDA out of memory error. I know that reducing the batch size can help in avoiding this. The batch size is 2 which is relatively low. What should be done so as to avoid this error? You can find the error that I get after executing the script below.

chinmay@chinmay-Legion-Y540-15IRH:~/Desktop/M3D-RPN$ python scripts/train_rpn_3d.py --config=kitti_3d_multi_warmup
Setting up a new session...
Visdom successfully connected to server
Preloading imdb.
weighted respectively as 1.05 and 0.00
Found 3534 foreground and 178 empty images
Labels not used in training.. ['DontCare', 'Truck', 'Tram', 'Misc', 'Person_sitting']
conf: {
model: densenet121_3d_dilate
solver_type: sgd
lr: 0.004
momentum: 0.9
weight_decay: 0.0005
max_iter: 50000
snapshot_iter: 10000
display: 250
do_test: True
lr_policy: poly
lr_steps: None
lr_target: 4e-08
rng_seed: 2
cuda_seed: 2
image_means: [0.485, 0.456, 0.406]
image_stds: [0.229, 0.224, 0.225]
feat_stride: 16
has_3d: True
test_scale: 512
crop_size: [512, 1760]
mirror_prob: 0.5
distort_prob: -1
dataset_test: kitti_split1
datasets_train: [{'anno_fmt': 'kitti_det',
'im_ext': '.png',
'name': 'kitti_split1',
'scale': 1}]
use_3d_for_2d: True
percent_anc_h: [0.0625, 0.75]
min_gt_h: 32.0
max_gt_h: 384.0
min_gt_vis: 0.65
ilbls: ['Van', 'ignore']
lbls: ['Car', 'Pedestrian', 'Cyclist']
batch_size: 2
fg_image_ratio: 1.0
box_samples: 0.2
fg_fraction: 0.2
bg_thresh_lo: 0
bg_thresh_hi: 0.5
fg_thresh: 0.5
ign_thresh: 0.5
best_thresh: 0.35
nms_topN_pre: 3000
nms_topN_post: 40
nms_thres: 0.4
clip_boxes: False
test_protocol: kitti
test_db: kitti
test_min_h: 0
min_det_scales: [0, 0]
cluster_anchors: 0
even_anchors: 0
expand_anchors: 0
anchors: [[-0.5, -8.5, 15.5, 23.5, 51.969, 0.531,
1.713, 1.025, -0.799],
[-8.5, -8.5, 23.5, 23.5, 52.176, 1.618,
1.6, 3.811, -0.453],
[-16.5, -8.5, 31.5, 23.5, 48.334,
1.644, 1.529, 3.966, 0.673],
[-2.528, -12.555, 17.528, 27.555,
44.781, 0.534, 1.771, 0.971, 0.093],
[-12.555, -12.555, 27.555, 27.555,
44.704, 1.599, 1.569, 3.814, -0.187],
[-22.583, -12.555, 37.583, 27.555,
43.492, 1.621, 1.536, 3.91, 0.719],
[-5.069, -17.638, 20.069, 32.638,
34.666, 0.561, 1.752, 0.967, -0.384],
[-17.638, -17.638, 32.638, 32.638,
35.35, 1.567, 1.591, 3.81, -0.511],
[-30.207, -17.638, 45.207, 32.638,
37.128, 1.602, 1.529, 3.904, 0.452],
[-8.255, -24.01, 23.255, 39.01, 28.771,
0.613, 1.76, 0.98, 0.067],
[-24.01, -24.01, 39.01, 39.01, 28.331,
1.543, 1.592, 3.66, -0.811],
[-39.764, -24.01, 54.764, 39.01,
30.541, 1.626, 1.524, 3.908, 0.312],
[-12.248, -31.996, 27.248, 46.996,
23.011, 0.606, 1.758, 0.996, 0.208],
[-31.996, -31.996, 46.996, 46.996,
22.948, 1.51, 1.599, 3.419, -1.076],
[-51.744, -31.996, 66.744, 46.996,
25.0, 1.628, 1.527, 3.917, 0.334],
[-17.253, -42.006, 32.253, 57.006,
18.479, 0.601, 1.747, 1.007, 0.347],
[-42.006, -42.006, 57.006, 57.006,
18.815, 1.487, 1.599, 3.337, -0.862],
[-66.759, -42.006, 81.759, 57.006,
20.576, 1.623, 1.532, 3.942, 0.323],
[-23.527, -54.553, 38.527, 69.553,
15.035, 0.625, 1.744, 0.917, 0.41],
[-54.553, -54.553, 69.553, 69.553,
15.346, 1.29, 1.659, 3.083, -0.275],
[-85.58, -54.553, 100.58, 69.553,
16.326, 1.613, 1.527, 3.934, 0.268],
[-31.39, -70.281, 46.39, 85.281,
12.265, 0.631, 1.747, 0.954, 0.317],
[-70.281, -70.281, 85.281, 85.281,
11.878, 1.044, 1.67, 2.415, -0.211],
[-109.171, -70.281, 124.171, 85.281,
13.58, 1.621, 1.539, 3.961, 0.189],
[-41.247, -89.994, 56.247, 104.994,
9.932, 0.61, 1.771, 0.934, 0.486],
[-89.994, -89.994, 104.994, 104.994,
8.949, 0.811, 1.766, 1.662, 0.08],
[-138.741, -89.994, 153.741, 104.994,
11.043, 1.61, 1.533, 3.899, 0.04],
[-53.602, -114.704, 68.602, 129.704,
8.389, 0.604, 1.793, 0.95, 0.806],
[-114.704, -114.704, 129.704, 129.704,
8.071, 1.01, 1.751, 2.19, -0.076],
[-175.806, -114.704, 190.806, 129.704,
9.184, 1.606, 1.526, 3.869, -0.066],
[-69.089, -145.677, 84.089, 160.677,
6.923, 0.627, 1.791, 0.96, 0.784],
[-145.677, -145.677, 160.677, 160.677,
6.784, 1.384, 1.615, 2.862, -1.035],
[-222.266, -145.677, 237.266, 160.677,
7.863, 1.617, 1.55, 3.948, -0.071],
[-88.5, -184.5, 103.5, 199.5, 5.189,
0.66, 1.755, 0.841, 0.173],
[-184.5, -184.5, 199.5, 199.5, 4.388,
0.743, 1.728, 1.381, 0.642],
[-280.5, -184.5, 295.5, 199.5, 5.583,
1.583, 1.547, 3.862, -0.072]]
bbox_means: [[-0.0, 0.002, 0.064, -0.093, 0.011,
-0.067, 0.192, 0.059, -0.021, 0.069,
-0.004]]
bbox_stds: [[0.14, 0.126, 0.247, 0.239, 0.163,
0.132, 3.621, 0.382, 0.102, 0.503,
1.855]]
anchor_scales: [32.0, 40.11, 50.276, 63.019, 78.991,
99.012, 124.106, 155.561, 194.989,
244.409, 306.354, 384.0]
anchor_ratios: [0.5, 1.0, 1.5]
hard_negatives: True
focal_loss: 0
cls_2d_lambda: 1
iou_2d_lambda: 1
bbox_2d_lambda: 0
bbox_3d_lambda: 1
bbox_3d_proj_lambda: 0.0
hill_climbing: True
visdom_port: 8100
}
Traceback (most recent call last):
File "scripts/train_rpn_3d.py", line 196, in
main(sys.argv[1:])
File "scripts/train_rpn_3d.py", line 122, in main
cls, prob, bbox_2d, bbox_3d, feat_size = rpn_net(images)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/Desktop/M3D-RPN/output/kitti_3d_multi_warmup/densenet121_3d_dilate.py", line 83, in forward
x = self.base(x)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 111, in forward
new_features = layer(features)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 84, in forward
bottleneck_output = self.bn_function(prev_features)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 41, in bn_function
bottleneck_output = self.conv1(self.relu1(self.norm1(concated_features))) # noqa: T484
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 131, in forward
return F.batch_norm(
File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2056, in batch_norm
return torch.batch_norm(
RuntimeError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 5.79 GiB total capacity; 4.60 GiB already allocated; 3.81 MiB free; 4.72 GiB reserved in total by PyTorch)

Change you batch size