JuliaIO/Zarr.jl

Benchmark vs ZFP

miguelraz opened this issue · 2 comments

zfp is a library by LLNL that also does multidimensional array compression.

It would be nice to see benchmarks against their vectorized (SIMD) algorithms.

You can easily use their library via ]add zfp_jll on recent Julia versions and call the shared library methods.

cc @aviks, who helped setup the zfp_jll wrapper and might have some usage examples lying around.

Thanks for bringing this to our attention. Zarr itself does not implement any compression algorithm on its own.

However, it would be nice to add zfp as an additional compression algorithm, it is already wrapped in the Python implementation of zarr (https://numcodecs.readthedocs.io/en/latest/zfpy.html), I just did not yet stumble across a dataset which used it, so it is not implemented yet.

If anyone wants to tackle this: compressors are currently defined in https://github.com/meggart/Zarr.jl/blob/master/src/Compressors.jl. In this case it would probably be necessary to slightly modify the compressor interface, because so far the compressors operated on pure byte arrays, so the compressor did not see information on the structure of the chunk (size and number of dimensions). Maybe we would need an abstract type for ByteCompressor and NDArrayCompressor, would also be useful when we add JPEG2000 compression or similar in the future.

There was some discussion on the Python implementation of zfp compression, where there was a similar problem with flattening arrays during compression like I described above zarr-developers/numcodecs#303