Environment: nvidia 1080Ti, cuda 8.0, cudnn 6.0, pytorch 0.2.0 Results: use_cuda: True, has_backward: False method0: 8.669853210449218e-05, batch_size: 8, size: 8, num_rois: 10 method1: 0.0017281007766723634, batch_size: 8, size: 8, num_rois: 10 method0: 0.00010861873626708984, batch_size: 8, size: 8, num_rois: 100 method1: 0.015480890274047851, batch_size: 8, size: 8, num_rois: 100 method0: 0.0001151275634765625, batch_size: 64, size: 64, num_rois: 100 method1: 0.015230441093444824, batch_size: 64, size: 64, num_rois: 100 method0: 0.0007535743713378906, batch_size: 64, size: 64, num_rois: 1000 method1: 0.1613228702545166, batch_size: 64, size: 64, num_rois: 1000 method0: 0.00024219512939453126, batch_size: 256, size: 256, num_rois: 100 method1: 0.01744112491607666, batch_size: 256, size: 256, num_rois: 100 method0: 0.0008198451995849609, batch_size: 256, size: 256, num_rois: 1000 method1: 0.1770816421508789, batch_size: 256, size: 256, num_rois: 1000 use_cuda: True, has_backward: True method0: 0.00018054485321044922, batch_size: 8, size: 8, num_rois: 10 method1: 0.006248035430908203, batch_size: 8, size: 8, num_rois: 10 method0: 0.0003832864761352539, batch_size: 8, size: 8, num_rois: 100 method1: 0.06724734783172608, batch_size: 8, size: 8, num_rois: 100 method0: 0.0019525957107543945, batch_size: 64, size: 64, num_rois: 100 method1: 0.05075277805328369, batch_size: 64, size: 64, num_rois: 100 method0: 0.0017806100845336914, batch_size: 64, size: 64, num_rois: 1000 method1: 0.4923022508621216, batch_size: 64, size: 64, num_rois: 1000 method0: 0.06174903392791748, batch_size: 256, size: 256, num_rois: 100 method1: 0.43788302898406983, batch_size: 256, size: 256, num_rois: 100 method0: 0.06140669345855713, batch_size: 256, size: 256, num_rois: 1000 method1: 3.2348715257644653, batch_size: 256, size: 256, num_rois: 1000 # see https://discuss.pytorch.org/t/extract-sub-region-of-conv-feature-map/1480/2