/n5-zarr

Zarr filesystem backend for N5

Primary LanguageJavaBSD 2-Clause "Simplified" LicenseBSD-2-Clause

n5-zarr Build Status

Zarr filesystem backend for N5.

This library provides best effort compatibility with existing Zarr v2 data stored in filesystem containers. So far, I have tested

  • self-consistency, i.e. data writing and reading in this implementation works,
  • reading of some example zarr containers written with Python (check the examples)

Build instructions

Blosc compressors depend on n5-blosc which depends on libblosc1, On Ubuntu 18.04 or later, install with:

sudo apt-get install -y libblosc1

On other platforms, please check the installation instructions for JBlosc.

Build and install:

mvn clean install

This also works without Blosc compression if not available.

Supported Zarr features

This implementation currently supports the following Zarr features

Groups
as described by the Zarr v2 spec for hierarchies.
N-dimensional arrays in C and F order
for efficiency, both array types are opened without changing the order of elements in memory, only the chunk order and shape are adjusted such that chunks form the correct transposed tensor. When used via n5-imglib2, a view with permuted axes can be easily created.
Arbitrary meta-data
stored as JSON for both groups and datasets.
Compression
currently, only the most relevant compression schemes (Zstandard, Blosc, GZip, Zlib, and BZ2) are supported, we can add others later as necessary.
Primitive types as little and big endian
so far, I have tested unsigned and signed integers with 1, 2, 4 and 8 bytes, and floats with 4 and 8 bytes. The behavior for other types is untested because I did not have meaningful examples. Complex numbers should be mapped into the best matching primitive real type. Other numpy data types such as strings, timedeltas, objects, dates, or others should come out as uncompressed bytes.

Fill value behavior is not well tested because I had no good idea what the best solution is. They are currently interpreted as numbers (if possible) including the special cases "NaN", "Infinity", and "-Infinity" for floating point numbers. Some hickups expected for other data.

Filters are not currently supported because I feel ambiguous about them. Please let me know of use cases that I am missing, it is not hard to add data filters of any kind. I just cannot come up with a single thing that I would like to do here right now.

N5 gimmicks

Optionally, N5 dataset attributes ("dimensions", "blockSize", "compression", "dataType") can be virtually mapped such that N5-API based code that reads or writes them directly via general attribute access will see and modify the corresponding zarray (dataset) attributes. Keep in mind that this will lead to name clashes if a Zarr dataset uses any of these attributes for other purposes, try switching the mapping off first if that is an issue.

Examples

TODO add code examples. For now, have a look at the tests.