girder/large_image

Zarr sink multiprocessing issue

Closed this issue · 1 comments

The zarr sink doesn't work correctly with multiprocessing. Specifically, try this:

In python repl 1:

import large_image
import numpy as np
ts = large_image.new()
ts.addTile(np.zeros((1, 1, 1)), x=4095, y=4095, s=3, z=4)  # our image is now 5,4096,4096,4
print(ts.largeImagePath)  # note this name

In python repl 2:

import large_image
import numpy as np
ts = large_image.open(<name from above>)
ts.addTile(np.zeros((1, 1, 1)), x=2047, y=2047, s=3, z=2)

In python repl 1:

print(ts.metadata)  # z is now only 3 long, not 5

Specifically, when we get the 0 level, we need to ensure that we honor the existing array and not resize it down.

This test shows the error:

def testMultiprocessZarrSink(tmp_path):
    ts = large_image_source_zarr.new()
    ts.addTile(np.zeros((1, 1, 1)), x=4095, y=4095, z=4)
    path = ts.largeImagePath
    subprocess.check_call([sys.executable, '-c', """import large_image_source_zarr
import numpy as np
ts = large_image_source_zarr.open('%s')
ts.addTile(np.zeros((1, 1, 1)), x=2047, y=2047, z=2)
""" % path])

    assert ts.metadata['IndexRange']['IndexZ'] == 5
    assert ts.sizeX == 4096

Curiously, adding the subprocess code as a function and using the multiprocessing.Process call doesn't show the error.