NVIDIA/apex

Memory leak while transferring tensor to cpu

neeraj-j opened this issue · 9 comments

Hi,

I am observing memory leak while transferring tensor from GPU to CPU in pytorch. Following code can summarize the issue. Here data_loader is feeding images. Memory leak is observed while using opt_level 'O1'. If I use opt_level 'O0' there is no leak. I am seeing this issue after updating apex to current version.

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.eval()
for epoch in range(10):
       for i, input in enumerate(data_loader):  
            # compute output  
            output = model(input)  
            output = output.cpu().numpy()  

I am using :
apex ver: 0.1 "https://github.com/NVIDIA/apex.git" master branch dated 11-25-2019.
Pytorch ver: 1.3.0
Ubuntu: 18.04
cuda: 10.1
I tried typecasting 'output' to float() at gpu before transferring to cpu and converting numpy array to float16. Nothing works.

I have the same problem :/

I am having a similar issue. It has proven to be very hard to track down because it appears inconsistently and only affects some computers whereas others are not affected at all.

The situation in which the memory leak occurs is always the same: O1 mixed precision training. During the training loop everything is fine, but in the validation loop the RAM usage goes up. In every epoch. Disabling mixed precision training makes this problem go away.
Over the course of a training this easily amounts to 100GB or more of RAM usage and is enough to break the training script.

Here are my observations so far:

  • I have only been able to get this behavior on Ubuntu 18.04. It does not occur on centOS.
  • cuda 10.1 on both systems
  • I have tested the most recent apex master. This problem has been there for a while though (I just could not pinpoint it which is why I have not posted before)
  • I have tested both python 3.6.8 and 3.8.2
  • my code does semantic segmentation with a U-Net. The memory leak only occurs with 2D convs, not with 3D convs
  • Tested on RTX 2080ti
  • Strangely the issue does not always appear right at the start of the training. Sometimes the first couple of epochs are fine and then after five epochs or so the issue appears. When it appears is quite inconsistent which another reason why it took me so long to figure out it was related to mixed precision training

My code looks similar to what @neeraj-j has posted:

        with torch.no_grad():
            self.network.eval()
            val_losses = []
            for b in range(self.num_val_batches_per_epoch):
                l = self.run_iteration(self.val_gen, False) # l is a simple scalar that has been detached and converted to numpy
                val_losses.append(l)
            self.all_val_losses.append(np.mean(val_losses))
    def run_iteration(self, data_generator, do_backprop=True):
        data_dict = next(data_generator)
        data = data_dict['data']
        target = data_dict['target']

        data = maybe_to_torch(data)
        target = maybe_to_torch(target)

        if torch.cuda.is_available():
            data = to_cuda(data)
            target = to_cuda(target)

        self.optimizer.zero_grad()

        output = self.network(data)
        del data

        loss = self.loss(output, target)
        del target

        if do_backprop:
            if not self.fp16 or amp is None or not torch.cuda.is_available():
                loss.backward()
            else:
                with amp.scale_loss(loss, self.optimizer) as scaled_loss:
                    scaled_loss.backward()
            _ = clip_grad_norm_(self.network.parameters(), 12)
            self.optimizer.step()
        return loss.detach().cpu().numpy()

Maybe someone has an idea of what could be going on? @mcarilli perhaps? :-)

All my code is on github, if you are interested please contact me an I can give you step by step instructions on how to reproduce this issue.

Best,
Fabian

Hey there, this problem still persists and it would be fantastic to get a response. Is this a known issue to you?

Yes, this problem is still happening to me on Ubuntu 20.04. It took a whole day for me to trace down the memory leak to one line:

t.to(cpuDevice).to(torch::kFloat);

here t is a tensor on GPU and it has half precision.

Update: I fixed the above problem by upgrading to CUDA 11.0, and pytorch 1.7

I have the same problem .

Hi,
the problem will go away if you compile pytorch yourself with a more recent version of cuDNN. I have no problems whatsoever with 8.0.2.
Best,
Fabian

Yes, this problem is still happening to me on Ubuntu 20.04. It took a whole day for me to trace down the memory leak to one line:

t.to(cpuDevice).to(torch::kFloat);

here t is a tensor on GPU and it has half precision.

Update: I fixed the above problem by upgrading to CUDA 11.0, and pytorch 1.7

I have the same problem, how could I fix the bug on older pytorch version, eg. cuda10.1 + pytorch 1.4

@NiHaoUCAS, I think you might have to update...

@NiHaoUCAS, I think you might have to update...

as @FabianIsensee said, compile pytorch with cudnn 8.02 can help(pytorch1.4 + cudnn8.02)? it's big challenge to update pytorch, for engine issue.