Too long test time
MRRRKING opened this issue · 30 comments
Thanks you for your codes.
When I use the pytorch1.6.0 branch and default setting, the test time is very very long(~7days).
My GPU is 1080Ti, and the training time is normal(~13hours).
Have you ever encountered the same problem? And do you have any ideas? Thanks a lot.
Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?
Hi, I haven't met this problem before. I have asked my friends to help me test on their machines and the test time looks normal (2-3 hours to test on VOC 2007 test). Could you check whether you run multiple experiments on the same GPU at the same time?
Thanks you for your codes.
But I also encountered this problem.
My GPU is 3080, and the test time is also very very long(~36hours), while the training time is normal(~6hours).
My Pytroch version is 1.7.0 with CUDA 11.1(Nvidia Driver Version : 455.38).
And I used default setting. (Although I installed mmcv for CUDA 11).
Are this problem related to CUDA or driver version?
Thanks a lot.
Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by nvidia-smi
?
Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by
nvidia-smi
?
Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.
Line 136 in dc16cfa
Do you have any direction to solve the problem?
Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by
nvidia-smi
?Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.Line 136 in dc16cfa
Do you have any direction to solve the problem?
That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in im_detect_bbox_aug
and the time cost of each line in these codes?
In addition, if TEST.BBOX_AUG.ENABLED
is set to False
, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.
Hi @MRRRKING @U201714643 , we have tested on V100 and 2080Ti GPUs. The testing speed looks reasonable (less than 3 hours on VOC 2007 test). Could you try to follow install.sh to set up environments? Could you also check the GPU utilization rate by
nvidia-smi
?Yes, I followed the install.sh file. And the GPU utilization is 100%.
I try to set the parameter TEST.BBOX_AUG.ENABLED to False, and the testing time is normal(~2 hours).
I suspect the problem lies in the im_detect_bbox_aug function.
Line 136 in dc16cfa
Do you have any direction to solve the problem?
That's weird. Sorry I don't have 1080Ti GPUs and thus cannot reproduce the issue. Could you try to record the time cost of each part in
im_detect_bbox_aug
and the time cost of each line in these codes?In addition, if
TEST.BBOX_AUG.ENABLED
is set toFalse
, the test time will be reduced by about 10x, so the reasonable test time should be less than half an hour.
Hi.
I think this line might lead to long test time:
pcl.pytorch/lib/modeling/model_builder.py
Line 119 in dc16cfa
This is my way to measure its run time:
torch.cuda.synchronize()
start = time.time()
############################
blob_conv = self.Conv_Body(im_data).contiguous()
############################
torch.cuda.synchronize()
end = time.time()
print('blob_conv = self.Conv_Body(im_data).contiguous():',end-start,'s')
And This is the result:
blob_conv = self.Conv_Body(im_data).contiguous(): 0.16710186004638672 s
blob_conv = self.Conv_Body(im_data).contiguous(): 0.22522902488708496 s
blob_conv = self.Conv_Body(im_data).contiguous(): 0.21841096878051758 s
blob_conv = self.Conv_Body(im_data).contiguous(): 0.9169421195983887 s
blob_conv = self.Conv_Body(im_data).contiguous(): 0.9236195087432861 s
blob_conv = self.Conv_Body(im_data).contiguous(): 2.9725072383880615 s
blob_conv = self.Conv_Body(im_data).contiguous(): 2.966435432434082 s
blob_conv = self.Conv_Body(im_data).contiguous(): 8.325863361358643 s
blob_conv = self.Conv_Body(im_data).contiguous(): 8.330979108810425 s
blob_conv = self.Conv_Body(im_data).contiguous(): 0.15090179443359375 s
INFO test_engine.py: 270: im_detect: range [1, 4952] of 4952: 1/4952 25.556s (eta: 1 day, 11:08:47)
Do you have any direction to solve the problem?
Could you try to add torch.cuda.empty_cache()
after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.
Btw, could you also make sure to add CUDA_VISIBLE_DEVICES=0
at the beginning of the test command and do not use --multi-gpu-testing
? There are some bugs in multi-gpu testing.
I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)
# cls prob (activations after softmax)
scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()
for i in range(1, cfg.REFINE_TIMES):
scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze()
scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.reshape([-1, scores.shape[-1]])
time6 = time()
print('time6: ', time6 - time5)`
And the result is below:
`480
time5: 1.396902084350586
time6: 0.05057382583618164
576
time5: 0.0050432682037353516
time6: 2.703824758529663
688
time5: 0.005361795425415039
time6: 4.799558162689209
864
time5: 0.005181312561035156
time6: 11.79275107383728
1200
time5: 0.0075037479400634766
time6: 27.75927186012268
`
From the test results, time is not spent on model prediction, but on data conversion.
Line 109 in 0896c82
I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?
Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.
Hi,
Thanks for your advice.
But it seems not to work.
And I am sure that I have added CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and not used --multi-gpu-testing.
Moreover, I think this problem might be related to vgg16.
Because, when i =5 ,line 113 will take too much time to run.
pcl.pytorch/lib/modeling/vgg16.py
Lines 111 to 114 in 4c3cfc9
For example:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' % 1 )(): 0.014625310897827148 s
Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' % 2 )(): 0.011127233505249023 s
Sequential(
(0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' % 3 )(): 0.015486001968383789 s
Sequential(
(0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' % 4 )(): 0.014803886413574219 s
Sequential(
(0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
x = getattr(self, 'conv%d' % 5 )(): 8.2745041847229 s
Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(1): ReLU(inplace=True)
(2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(3): ReLU(inplace=True)
(4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2))
(5): ReLU(inplace=True)
)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
And this is my way to measure its run time:
print('-------------------------------------------------------------------')
torch.cuda.synchronize()
start = time.time()
############################
x = getattr(self, 'conv%d' % i)(x)
############################
torch.cuda.synchronize()
end = time.time()
print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s')
print(getattr(self, 'conv%d' % i))
print('-------------------------------------------------------------------')
Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB.
Do you have any direction to solve the problem?
I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)cls prob (activations after softmax)
scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()
for i in range(1, cfg.REFINE_TIMES): scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze() scores /= cfg.REFINE_TIMES # In case there is 1 proposal scores = scores.reshape([-1, scores.shape[-1]]) time6 = time() print('time6: ', time6 - time5)`
And the result is below:
`480time5: 1.396902084350586
time6: 0.05057382583618164
576time5: 0.0050432682037353516
time6: 2.703824758529663
688time5: 0.005361795425415039
time6: 4.799558162689209
864time5: 0.005181312561035156
time6: 11.79275107383728
1200time5: 0.0075037479400634766
time6: 27.75927186012268
`From the test results, time is not spent on model prediction, but on data conversion.
Line 109 in 0896c82
I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?
Could you try to change these codes to the following codes?
scores = return_dict['refine_score'][0].squeeze()
for i in range(1, cfg.REFINE_TIMES):
scores += return_dict['refine_score'][i].squeeze()
scores /= cfg.REFINE_TIMES
# In case there is 1 proposal
scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()
I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0
Could you try to add torch.cuda.empty_cache() after this line of codes? I'm not sure about the exact reason. I guess one possible reason is the GPU cache has not been released after testing each image.
Hi,
Thanks for your advice.
But it seems not to work.
And I am sure that I have added CUDA_VISIBLE_DEVICES=0 at the beginning of the test command and not used --multi-gpu-testing.
Moreover, I think this problem might be related to vgg16.
Because, when i =5 ,line 113 will take too much time to run.
pcl.pytorch/lib/modeling/vgg16.py
Lines 111 to 114 in 4c3cfc9
For example:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ x = getattr(self, 'conv%d' % 1 )(): 0.014625310897827148 s Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ x = getattr(self, 'conv%d' % 2 )(): 0.011127233505249023 s Sequential( (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ x = getattr(self, 'conv%d' % 3 )(): 0.015486001968383789 s Sequential( (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU(inplace=True) (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) ) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ x = getattr(self, 'conv%d' % 4 )(): 0.014803886413574219 s Sequential( (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): ReLU(inplace=True) (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (3): ReLU(inplace=True) (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (5): ReLU(inplace=True) ) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ x = getattr(self, 'conv%d' % 5 )(): 8.2745041847229 s Sequential( (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2)) (1): ReLU(inplace=True) (2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2)) (3): ReLU(inplace=True) (4): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(2, 2), dilation=(2, 2)) (5): ReLU(inplace=True) ) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
And this is my way to measure its run time:
print('-------------------------------------------------------------------') torch.cuda.synchronize() start = time.time() ############################ x = getattr(self, 'conv%d' % i)(x) ############################ torch.cuda.synchronize() end = time.time() print('x = getattr(self, \'conv%d\' % ',i,')():',end-start,'s') print(getattr(self, 'conv%d' % i)) print('-------------------------------------------------------------------')
Besides, during testing, my GPU Utilization is about 95%, and VRAM usage is about 4500MB.
Do you have any direction to solve the problem?
That's weird...
Could you try to change padding
and dilation
to (1, 1)
for conv5 to see what will happen?
That's weird...
Could you try to changepadding
anddilation
to(1, 1)
for conv5 to see what will happen?
Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.
That's weird...
Could you try to changepadding
anddilation
to(1, 1)
for conv5 to see what will happen?Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.
It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?
That's weird...
Could you try to changepadding
anddilation
to(1, 1)
for conv5 to see what will happen?Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?
Thanks for your advice.
But I have re-trained model with dilation 1.
That's weird...
Could you try to changepadding
anddilation
to(1, 1)
for conv5 to see what will happen?Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?
Thanks for your advice.
But I have re-trained model with dilation 1.
I see. Maybe dilation 1 is the reason for performance drop.
That's weird...
Could you try to changepadding
anddilation
to(1, 1)
for conv5 to see what will happen?Thanks for your advice.
It works.
Now mAP is 51.3%, and CorLoc is 67.0%.(Model has been re-trained.)
In addition, My training time is about 6 hours, and testing time is about 1 hour.
Besides, My pytorch version is 1.7.1, because RTX3080 doesn't support cuda 10.2, which is required by pytorch 1.6.0.It seems to be the dilated convolution issue. Your model is trained with dilation 2 but tested with dilation 1, which results in inconsistency between training and testing and thus harms performance. Could you also try to train with dilation 1 to see the numbers?
Thanks for your advice.
But I have re-trained model with dilation 1.
Btw, could you try to add
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
after this line of codes for dilation=2?
Btw, could you try to add
torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False
after this line of codes for dilation=2?
Thanks for your advice.
But testing time is still 36 hours with dilation 2.
I test the running time in this way:
` print(target_scale)
print('*************************************')
time4 = time()
return_dict = model(**inputs)
time5 = time()
print('time5: ', time5 - time4)cls prob (activations after softmax)
scores = return_dict['refine_score'][0].data.cpu().numpy().squeeze()
for i in range(1, cfg.REFINE_TIMES): scores += return_dict['refine_score'][i].data.cpu().numpy().squeeze() scores /= cfg.REFINE_TIMES # In case there is 1 proposal scores = scores.reshape([-1, scores.shape[-1]]) time6 = time() print('time6: ', time6 - time5)`
And the result is below:
480 time5: 1.396902084350586 time6: 0.05057382583618164 576 time5: 0.0050432682037353516 time6: 2.703824758529663 688 time5: 0.005361795425415039 time6: 4.799558162689209 864 time5: 0.005181312561035156 time6: 11.79275107383728 1200 time5: 0.0075037479400634766 time6: 27.75927186012268
From the test results, time is not spent on model prediction, but on data conversion.
Line 109 in 0896c82
I test the old codes which run on the pytroch 0.4.1 environment, and its testing time is normal.
Is it related to the pytorch version?Could you try to change these codes to the following codes?
scores = return_dict['refine_score'][0].squeeze() for i in range(1, cfg.REFINE_TIMES): scores += return_dict['refine_score'][i].squeeze() scores /= cfg.REFINE_TIMES # In case there is 1 proposal scores = scores.view(-1, scores.size[-1]).data.cpu().numpy()
I don't think the issue is from pytorch version. On my GPUs, I could get correct results using pytorch 1.6.0
I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.
I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.
Are you using pytorch 1.7.0 or 1.7.1?
Hello, I meet the same problem in the test.
I use gtx1080ti, pytorch 1.6.0
but I made the following changes in install.sh:
①pip --no-cache-dir install mmcv-full==latest+torch1.6.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html
Change to
pip install mmcv-full -f https://download.openmmlab.oss.com/mmcv/dist/cu101/torch1.6.0/index.html (according to the official)
(because I always report mistakes in the original method)
②pip --no-cache-dir install numpy==1.16.0
Change to pip -- no cache dir install numpy==1.19.5
(for the reason when trying to solve the problem of mmcv, the environment always downloads 1.16.0 first, then automatically deletes it and uses 1.19.5.)
After that, training for 13 hours is normal, and the test showed that it was expected to be 6 days.
I don't know if these will lead to problems in the test.
Could you also try to add torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412
I replaced these codes, but it didn't work.
The testing time is normal with dilation 1, but the mAP is lower too.Are you using pytorch 1.7.0 or 1.7.1?
No, I use pytorch 1.6.0.
Could you also try to add
torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412
Thanks for your advice.
It works.
Now mAP is 51.7%, and CorLoc is 68.2%.(Model is trained with with dilation 2)
In addition, testing time is about 80 minutes.
Besides, VRAM usage ranges between 7500MB and 9500MB, which is more than testing with cudnn.
Could you also try to add
torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412
It works. Thanks a lot.
Now the testing time is about 2.5 hours, and mAP is 51.9.
Could you also try to add
torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412
Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231
Could you also try to add
torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231
Hi,
Did you re-train your model with cudnn disabled?
Great! Thanks for helping to debug! It is unnecessary to re-train the model with cudnn disabled. Btw, you could try different random seeds (1~10) by changing cfg.RNG_SEED to reproduce the reported numbers.
Could you also try to add
torch.backends.cudnn.enabled = False
after this line of codes for dilation=2? Other people observed a similar issue of low speed of dilated convolution on some GPU cards: https://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412Thanks for your help!
It works. the testing time is normal about 2h24min,
Mean AP = 0.5231Hi,
Did you re-train your model with cudnn disabled?
i just add "torch.backends.cudnn.enabled = False" in test_net, and all other codes are default
您是否还可以尝试在这行代码
torch.backends.cudnn.enabled = False
之后添加膨胀= 2?其他人在某些 GPU 卡上观察到类似的扩张卷积速度低的问题:https ://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412谢谢你的帮助! 有用。测试时间正常约2h24min, 平均AP = 0.5231
您是否还可以尝试在这行代码
torch.backends.cudnn.enabled = False
之后添加膨胀= 2?其他人在某些 GPU 卡上观察到类似的扩张卷积速度低的问题:https ://discuss.pytorch.org/t/speed-drop-with-dilated-conv-on-different-gpus/82412谢谢你的帮助! 有用。测试时间正常约2h24min, 平均AP = 0.5231
您好!请问您知道怎么可视化检测结果吗?