内存泄漏
miaomi1994 opened this issue · 6 comments
作者你好,我在使用你的代码训练模型的时候,发现内存一直在涨,请问你是否发现有内存泄漏的问题?
@miaomi1994 Are you facing that issue with S0 or B0? I noticed Swish implementation in S0 variant takes much more memory. I will plug in a memory efficient version of Swish for lower memory cost and do a memory profiling for B0 as welll.
@digantamisra98 I faced the issue with B0,and which version of pytorch are you using?Thanks
@miaomi1994 Interesting. Can you please provide your memory consumption details? My PyTorch version is
1.5.0+cu101
@digantamisra98 When I use B0,I find memory continues to grow.I use my own datasets(about 200w,batch_size=480) to train,and after several epochs,the program runs out of memory and then break.But S0 is normal.My Pytorch version is 1.3.1. I use pytorch multiprocess and DistributedDataParallel.
@miaomi1994 I will try to reproduce your memory issues in my own tests and see what's the issue in the next weekend.
@miaomi1994 Here are the memory profiling for both S0 and B0 variant for an input of (256,32,224,224) (B,C,H,W):
For S0:
GPU Memory Track | 21-Jul-20-19:02:03 | Total Used Memory:830.3 Mb
- | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.nn.parameter.Parameter'>
- | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.Tensor'>
At main : line 21 Total Used Memory:830.3 Mb - | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
At main : line 24 Total Used Memory:2474.5 Mb - | 2 * Size:(256, 32, 224, 224) | Memory: 3288.3 M | <class 'torch.Tensor'>
- | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
At main : line 27 Total Used Memory:12339.5Mb
At main : line 30 Total Used Memory:13983.7Mb
For B0:
GPU Memory Track | 21-Jul-20-19:10:19 | Total Used Memory:830.3 Mb
- | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.nn.parameter.Parameter'>
- | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.Tensor'>
At main : line 21 Total Used Memory:830.3 Mb - | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
At main : line 24 Total Used Memory:2474.5 Mb - | 2 * Size:(256, 32, 224, 224) | Memory: 3288.3 M | <class 'torch.Tensor'>
- | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
At main : line 27 Total Used Memory:11106.4Mb
At main : line 30 Total Used Memory:15627.8Mb