glencoesoftware/bioformats2raw

java.lang.NegativeArraySizeException with large tiff input with zip compression

blowekamp opened this issue · 3 comments

We are encountering the following error when running bioformats2raw:

2023-09-20 06:33:17,038 [pool-1-thread-1] ERROR c.g.bioformats2raw.Converter - Failure processing chunk; resolution=0 plane=1 xx=4096 yy=0 zz=0 width=1024 height=1024 depth=1
java.lang.NegativeArraySizeException: null
	at ome.codecs.ByteVector.doubleCapacity(ByteVector.java:86)
	at ome.codecs.ByteVector.add(ByteVector.java:75)
	at ome.codecs.ZlibCodec.decompress(ZlibCodec.java:81)
	at ome.codecs.BaseCodec.decompress(BaseCodec.java:194)
	at loci.formats.codec.WrappedCodec.decompress(WrappedCodec.java:86)
	at loci.formats.codec.ZlibCodec.decompress(ZlibCodec.java:48)
	at loci.formats.tiff.TiffCompression.decompress(TiffCompression.java:283)
	at loci.formats.tiff.TiffParser.getTile(TiffParser.java:831)
	at loci.formats.tiff.TiffParser.getSamples(TiffParser.java:1116)
	at loci.formats.tiff.TiffParser.getSamples(TiffParser.java:871)
	at loci.formats.in.MinimalTiffReader.openBytes(MinimalTiffReader.java:312)
	at loci.formats.in.TiffDelegateReader.openBytes(TiffDelegateReader.java:71)
	at loci.formats.FormatReader.openBytes(FormatReader.java:922)
	at loci.formats.ReaderWrapper.openBytes(ReaderWrapper.java:334)
	at loci.formats.ChannelSeparator.openBytes(ChannelSeparator.java:200)
	at loci.formats.ReaderWrapper.openBytes(ReaderWrapper.java:348)
	at loci.formats.MinMaxCalculator.openBytes(MinMaxCalculator.java:269)
	at loci.formats.MinMaxCalculator.openBytes(MinMaxCalculator.java:260)
	at com.glencoesoftware.bioformats2raw.Converter.getTile(Converter.java:1690)
	at com.glencoesoftware.bioformats2raw.Converter.processChunk(Converter.java:1802)
	at com.glencoesoftware.bioformats2raw.Converter.lambda$saveResolutions$4(Converter.java:2004)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

I am suspicious of an integer overflow issue.

A sample input to recreate this image can be recreated with the following Image Magick "convert" command line:

convert magick:logo -resize 19477x30872 -depth 16 -compress zip logo_zip.tiff

The compression and bit dept options appear to be required to reproduce this error.

Our pipeline converts a png to tiff before running bioformats2raw. We have found that adding -define tiff:tile-geometry=128x128 to the convert command not only bypasses the about bug but also improves the performance ~60x. And the slower tiff was still faster and directly processing the original large png.

Thanks for reporting this, @blowekamp. The stack trace indicates that the problem is in the ome-codecs library, which bioformats2raw uses via Bio-Formats. A corresponding issue in ome-codecs is now open: ome/ome-codecs#32. We can't really fix it here, but will need to update the Bio-Formats version in bioformats2raw once a fix is available and released.

A sample input to recreate this image can be recreated with the following Image Magick "convert" command line:

convert magick:logo -resize 19477x30872 -depth 16 -compress zip logo_zip.tiff

That means that ~1.2 GB of pixels are being compressed as a single tile. We really don't recommend doing that in general; this is also effectively what large PNGs have already, so is not expected to help very much as an intermediate conversion step.

We have found that adding -define tiff:tile-geometry=128x128 to the convert command not only bypasses the about bug but also improves the performance ~60x.

That's definitely expected. In the case where the whole input image is compressed as a single tile, that entire tile must be read and decompressed each time bioformats2raw reads a tile for conversion. Input images that use multiple smaller tiles are expected to perform better overall as individual tiles can be read as needed.

@melissalinkert Thank you for forwarding the issue to the appropriate project, and response with tips.

Sorry to add on an additional issue here.

We are also converting some CZI files and the processing seems relatively slow. I'm presuming this is for the same reasons of large compressed chunk(s). Is there anything we can do to either preprocess, or just load and decompress the input once to improve performance?

@blowekamp : one thing you might try is checking the optimal tile size reported by Bio-Formats for the .czi files. With showinf -nopix -noflat (included in Bio-Formats command line tools), look for a Tile size = line in the output. You might try setting the tile size in bioformats2raw to that reported tile size; it's not guaranteed, but that may reduce any repeated tile decompressions. Tiles as stored in .czi files often overlap, so requesting a fixed-size tile from the image can require multiple tiles to be read from the file and decompressed.

I'm not aware of a workflow to pre-process .czi files. If adjusting the tile size doesn't make a positive difference, we'd need more details before suggesting other options - the specific command being run, what kind of data is in the .czi file, the specifications of the system on which conversion is being run, and exact conversion times.