Missing patches when zarr-chunks is not "full"
Closed this issue · 3 comments
Related to #3 (Inference Sampler)
The grid created by zds.PatchSampler
doesn't take into account border of zarr if the zarr.shape is not a multiple of chunk.shape
Small example:
%load_ext autoreload
%autoreload 2
import zarr
import zarrdataset as zds
from torch.utils.data import DataLoader
filename = r"data.zarr"
# create empty zarr dataset
z = zarr.zeros((1, 1, 6), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)
patch_size = dict(Z=1, Y=1, X=2)
patch_sampler = zds.PatchSampler(patch_size=patch_size)
my_datasets = zds.ZarrDataset(
[
zds.ImagesDatasetSpecs(
filenames=filename,
source_axes="ZYX",
axes="ZYX",
)
],
patch_sampler=patch_sampler,
return_positions=True,
return_worker_id=True
)
my_dataloader = DataLoader(my_datasets,
num_workers=0,
worker_init_fn=zds.zarrdataset_worker_init_fn,
batch_size=1
)
for i, (wid, pos, sample) in enumerate(my_dataloader):
print(pos)
result:
tensor([[[0, 1],
[0, 1],
[0, 2]]])
tensor([[[0, 1],
[0, 1],
[2, 4]]])
Is there a reason it doesn't return (or is it a bug?)
tensor([[[0, 1],
[0, 1],
[0, 2]]])
tensor([[[0, 1],
[0, 1],
[2, 4]]])
tensor([[[0, 1],
[0, 1],
[4, 6]]])
Related to the possible solution of this issue:
In case of
z = zarr.zeros((1, 1, 5), chunks=(1, 1, 4), dtype='uint8')
zarr.save(filename, z)
patch_size = dict(Z=1, Y=1, X=2)
what would you expect as output?
- Exact grid
tensor([[[0, 1],
[0, 1],
[0, 2]]])
tensor([[[0, 1],
[0, 1],
[2, 4]]])
tensor([[[0, 1],
[0, 1],
[4, 5]]]) # <--- this one is smaller than the model might expect
- Cropped grid
tensor([[[0, 1],
[0, 1],
[0, 2]]])
tensor([[[0, 1],
[0, 1],
[2, 4]]])
# <--- But the border of this image won't be represented
- Adapted grid
tensor([[[0, 1],
[0, 1],
[0, 2]]])
tensor([[[0, 1],
[0, 1],
[2, 4]]])
tensor([[[0, 1],
[0, 1],
[3, 5]]]) # <--- But the img[...,4] will be loaded twice
Intuitively I would choose the adapted grid solution.
This is what I tried to implement here (I can opened a draft pull request just to show the differences) #6
Hi @ClementCaporal, thanks for noticing this issue!
The reason for missing patches from non-full chunks is highly related to the previous way of computing patch locations based on the chunk size instead of the patch size.
I'll review your pull request and iterate there to find a solution, but I think that it will probably solve by using #4.
Thanks again!