Out of memory,when I pruned model test
Tianxiaomo opened this issue · 4 comments
I use Tesla k80 -12G *4,When I prunning the training test everything was normal, but after the pruned test memory overflow.
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "/home/b418-xiwei/.pycharm_helpers/pydev/pydevd.py", line 1664, in
main()
File "/home/b418-xiwei/.pycharm_helpers/pydev/pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/b418-xiwei/.pycharm_helpers/pydev/pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/b418-xiwei/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/b418-xiwei/hgh/prune/finetune.py", line 343, in
fine_tuner.prune()
File "/home/b418-xiwei/hgh/prune/finetune.py", line 267, in prune
self.test()
File "/home/b418-xiwei/hgh/prune/finetune.py", line 187, in test
output = model(Variable(batch))
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/b418-xiwei/hgh/prune/finetune.py", line 68, in forward
x = self.features(x)
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/modules/pooling.py", line 142, in forward
self.return_indices)
File "/home/b418-xiwei/anaconda3/envs/distiller/lib/python3.6/site-packages/torch/nn/functional.py", line 360, in max_pool2d
ret = torch._C._nn.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58
I use batch_size=16,so it is
What GPU are you using?
have you solve this problem? I found that in finetune.py line172 and line174, this two backward operation will cause doubling the memory usage twice, increasing my memory from 3200MB to 7000MB than to 11000MB. The first increment is used when getting the pruning plan, so the gradient calculated is useless when finetuning, but I haven't found any way to clear that gradient cache.
@Tianxiaomo hi, can you tell me the command of test model,i can‘t find it,thy.
@CodePlay2016 I am facing the almost similar out-of-memory problem
Could you comment about this ? Do you have any actual, working countermeasure so far ?
[phung@archlinux pytorch-pruning]$ python finetune.py --prune
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
/usr/lib/python3.7/site-packages/torchvision/transforms/transforms.py:562: UserWarning: The use of the transforms.RandomSizedCrop transform is deprecated, please use transforms.RandomResizedCrop instead.
warnings.warn("The use of the transforms.RandomSizedCrop transform is deprecated, " +
Accuracy: 0.5848
Number of prunning iterations to reduce 67% filters 5
Ranking filters..
Traceback (most recent call last):
File "finetune.py", line 270, in <module>
fine_tuner.prune()
File "finetune.py", line 217, in prune
prune_targets = self.get_candidates_to_prune(num_filters_to_prune_per_iteration)
File "finetune.py", line 184, in get_candidates_to_prune
self.train_epoch(rank_filters = True)
File "finetune.py", line 179, in train_epoch
self.train_batch(optimizer, batch.cuda(), label.cuda(), rank_filters)
File "finetune.py", line 172, in train_batch
self.criterion(output, Variable(label)).backward()
File "/usr/lib/python3.7/site-packages/torch/tensor.py", line 96, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/lib/python3.7/site-packages/torch/autograd/__init__.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA error: out of memory
[phung@archlinux pytorch-pruning]$