Missing patches when zarr-chunks is not "full"

Question

Missing patches when zarr-chunks is not "full"

Closed this issue 8 months ago · 3 comments

Related to #3 (Inference Sampler)

The grid created by zds.PatchSampler doesn't take into account border of zarr if the zarr.shape is not a multiple of chunk.shape

Small example:

%load_ext autoreload
%autoreload 2

import zarr
import zarrdataset as zds
from torch.utils.data import DataLoader

filename = r"data.zarr"

# create empty zarr dataset
z = zarr.zeros((1, 1, 6), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)


patch_size = dict(Z=1, Y=1, X=2)
patch_sampler = zds.PatchSampler(patch_size=patch_size)

my_datasets = zds.ZarrDataset(
    [
    zds.ImagesDatasetSpecs(
        filenames=filename,
        source_axes="ZYX",
        axes="ZYX",
    )
    ],
    patch_sampler=patch_sampler,
    return_positions=True,
    return_worker_id=True
)

my_dataloader = DataLoader(my_datasets,
                    num_workers=0,
                        worker_init_fn=zds.zarrdataset_worker_init_fn,
                    batch_size=1
                    )

for i, (wid, pos, sample) in enumerate(my_dataloader):
    print(pos)

result:

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])

Is there a reason it doesn't return (or is it a bug?)

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [4, 6]]])

Answer 1 · 2024-04-25T14:30:24.000Z

Related to the possible solution of this issue:

In case of

z = zarr.zeros((1, 1, 5), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)
patch_size = dict(Z=1, Y=1, X=2)

what would you expect as output?

Exact grid

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [4, 5]]]) # <--- this one is smaller than the model might expect

Cropped grid

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
# <--- But the border of this image won't be represented

Adapted grid

tensor([[[0, 1],
         [0, 1],
         [0, 2]]])
tensor([[[0, 1],
         [0, 1],
         [2, 4]]])
tensor([[[0, 1],
         [0, 1],
         [3, 5]]]) # <--- But the img[...,4] will be loaded twice

Intuitively I would choose the adapted grid solution.
This is what I tried to implement here (I can opened a draft pull request just to show the differences) #6

Answer 2 · 2024-04-29T13:09:37.000Z

Hi @ClementCaporal, thanks for noticing this issue!

The reason for missing patches from non-full chunks is highly related to the previous way of computing patch locations based on the chunk size instead of the patch size.
I'll review your pull request and iterate there to find a solution, but I think that it will probably solve by using #4.

Thanks again!

Answer 3 · 2024-05-07T21:13:06.000Z

This is solved by PR #4, where patch size is used as base to compute the sampleable chunks in the input image.