bioformats2raw produces big-endian data type (>u2) from MRC and CZI of unsigned int 16
Closed this issue · 4 comments
The input which are unsigned int 16 are being written and big-endian unsigned int 16 in the zarr arrays when converted with bioformats to raw. I am using version 0.6.1 of bioformats2raw, and this is occurring on both a Mac M1, and a Linux Intel ( both little endian ).
$ ~/scratch/bioformats2raw-0.6.1/bin/bioformats2raw --compression null EBOV_VSV_1.mrc test.zarr
$ cat test.zarr//0/0/.zarray
{
"chunks" : [ 1, 1, 1, 1024, 1024 ],
"compressor" : null,
"dtype" : ">u2",
"fill_value" : 0,
"filters" : null,
"order" : "C",
"shape" : [ 1, 1, 35, 4096, 4096 ],
"dimension_separator" : "/",
"zarr_format" : 2
}
The '>' character in the dtype field indicates the pixel data type is non-native big-endian.
The conversion from a czi file was also to a big-endian unsigned 16 bit in the zarr array too.
The expectation was that the byte order would match the system.
This behavior is expected; bioformats2raw does not consider the system's native order, so will produce the same output independent of the system on which it is run.
@blowekamp : is this causing a problem, or is it something we just need to better document?
@melissalinkert Thank you for the response. It is something that certainly can be handled.
What was the reasoning to choosing big-endian over the more common CPU order of little endian?
P.S. Where I ran into the issue was with numpy/dask checking if the dtype was np.uint16 and it was not matching.
The choice to write only big-endian dates to a time when bioformats2raw wrote n5, which only supports big-endian data - see f9bfa06 and https://github.com/saalfeldlab/n5#file-system-specification. We've left that in place for simplicity, particularly as Java ByteBuffers default to big-endian.
I'll assume that choices were made to follow the "Network Byte Order" of big endian at some point.