[BUG] Pretrained resnet34 output changing significantly with different batch size in eval mode.

Question

[BUG] Pretrained resnet34 output changing significantly with different batch size in eval mode.

jseia opened this issue 10 months ago · 2 comments

Hi! I've seen that the output for the same input of a pretrained resnet 34 is changing according to the batch size in which this sample is included. I've set the model to eval mode, and (just in case) set all the 'reproducibility' seeds, deterministic mode settings and runned it with no grads tracking. I've also checked that the running averages are set to true in the batch_norm layers.

I've read in other issues that a small difference is expected within some float eps, however I'm experiencing not minor differences (as you can see in the following example). I don't know where this problem might be originated, am I missing something? Thanks!

To Reproduce
Steps to reproduce the behavior:

import random
import torch
import numpy as np
import timm
from torch import nn

random.seed(0)
torch.use_deterministic_algorithms(mode=True)
torch.manual_seed(0)
np.random.seed(0)

t1 = torch.randn(1, 3, 256, 284)
t2 = torch.randn(1, 3, 256, 284)
print((t1 == t2).all())

tensor(False)

batch_1 = t1.clone()
batch_2 = torch.concat([t1, t2], 0)

print(batch_1.shape, batch_2.shape)
print((batch_1 == batch_2[0]).all())

torch.Size([1, 3, 256, 284]) torch.Size([2, 3, 256, 284])
tensor(True)

model = timm.create_model('resnet34', pretrained='true', in_chans=3, num_classes=0, global_pool='')
model.global_pool = model.fc = nn.Identity()

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model = model.to(device).eval()

batch_1 = batch_1.to(device)
batch_2 = batch_2.to(device)

with torch.no_grad():
    output1 = model.forward(batch_1)
    output2 = model.forward(batch_2)

print(output1.shape, output2.shape)
print((output1 == output2[0]).all())
print((output1 - output2[0]).max())

torch.Size([1, 512, 8, 9]) torch.Size([2, 512, 8, 9])
tensor(False, device='cuda:0')
tensor(0.0025, device='cuda:0')

Expected behavior
I'd expect a difference around ~1e-8 not ~3e-3

Desktop:

OS: Ubuntu 22.04.3 LTS
This repository version: 0.9.16
PyTorch version w/ CUDA/cuDNN:
pytorch 2.2.1 py3.9_cuda12.1_cudnn8.9.2_0
pytorch-cuda 12.1 ha16c6d3_5

Answer 1 · 2024-03-10T18:28:47.000Z

@jseia it is indeed greater difference than I'd expect but not a timm issue and nothing I can do about it, it's a cudnn batchnorm quirk. If you look around I believe it's been brought up before in torch forums, issue trackers, etc.

If you use device = 'cpu', or disable cudnn (you also need to disable the forced determinism) torch.backends.cudnn.enabled = False you will see a difference of '0.'

Answer 2 · 2024-03-10T19:04:37.000Z

Thank you!