digantamisra98/EvoNorm

内存泄漏

miaomi1994 opened this issue · 6 comments

作者你好,我在使用你的代码训练模型的时候,发现内存一直在涨,请问你是否发现有内存泄漏的问题?

@miaomi1994 Are you facing that issue with S0 or B0? I noticed Swish implementation in S0 variant takes much more memory. I will plug in a memory efficient version of Swish for lower memory cost and do a memory profiling for B0 as welll.

@digantamisra98 I faced the issue with B0,and which version of pytorch are you using?Thanks

@miaomi1994 Interesting. Can you please provide your memory consumption details? My PyTorch version is

1.5.0+cu101

@digantamisra98 When I use B0,I find memory continues to grow.I use my own datasets(about 200w,batch_size=480) to train,and after several epochs,the program runs out of memory and then break.But S0 is normal.My Pytorch version is 1.3.1. I use pytorch multiprocess and DistributedDataParallel.

@miaomi1994 I will try to reproduce your memory issues in my own tests and see what's the issue in the next weekend.

@miaomi1994 Here are the memory profiling for both S0 and B0 variant for an input of (256,32,224,224) (B,C,H,W):

For S0:

GPU Memory Track | 21-Jul-20-19:02:03 | Total Used Memory:830.3 Mb

  • | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.nn.parameter.Parameter'>
  • | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.Tensor'>
    At main : line 21 Total Used Memory:830.3 Mb
  • | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
    At main : line 24 Total Used Memory:2474.5 Mb
  • | 2 * Size:(256, 32, 224, 224) | Memory: 3288.3 M | <class 'torch.Tensor'>
  • | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
    At main : line 27 Total Used Memory:12339.5Mb
    At main : line 30 Total Used Memory:13983.7Mb

For B0:

GPU Memory Track | 21-Jul-20-19:10:19 | Total Used Memory:830.3 Mb

  • | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.nn.parameter.Parameter'>
  • | 4 * Size:(1, 32, 1, 1) | Memory: 0.0005 M | <class 'torch.Tensor'>
    At main : line 21 Total Used Memory:830.3 Mb
  • | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
    At main : line 24 Total Used Memory:2474.5 Mb
  • | 2 * Size:(256, 32, 224, 224) | Memory: 3288.3 M | <class 'torch.Tensor'>
  • | 1 * Size:(256, 32, 224, 224) | Memory: 1644.1 M | <class 'torch.Tensor'>
    At main : line 27 Total Used Memory:11106.4Mb
    At main : line 30 Total Used Memory:15627.8Mb