convnext_tiny model produces outputs that are dependent on the batch size

Question

convnext_tiny model produces outputs that are dependent on the batch size

Avishka-Perera opened this issue 5 months ago · 1 comments

Describe the bug
Usually in a neural network, once we call model.eval(), for a given set of inputs, the outputs must be the same. But in the convnext_tiny model, this does not seem to be the case. Its output values depend on the batch size.

To Reproduce

import torch
from timm import create_model

torch.manual_seed(0)
device = "cuda"

model = create_model("convnext_tiny.fb_in22k", pretrained=True)
_ = model.to(device)
model.eval()

batch_inp = torch.Tensor(4,3,224,224).to(device)
single_inp = batch_inp[0].unsqueeze(0)
batch_out = model(batch_inp)
single_out = model(single_inp)
print(batch_out[0][0].item(), single_out[0][0].item())

# Outputs: -1.7183468341827393 -1.718350887298584

Expected behavior
As seen above, two different values are produced. But if it works correctly, the two printed values must be the same.

Desktop:

OS: Ubuntu 22.04
This repository version: timm==1.0.7
PyTorch version: torch==2.3.1
System CUDA==12.4

conda list output

nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
nvidia-nccl-cu12          2.20.5                   pypi_0    pypi
nvidia-nvjitlink-cu12     12.5.82                  pypi_0    pypi
nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi

Answer 1 · 2024-07-31T15:44:41.000Z

@Avishka-Perera it's expected PyTorch behaviour, nothing specific to timm or the model either, similar discussion here #1509