lz4/lz4-java

Support for dependent blocks in decompression

cnuernber opened this issue · 3 comments

Reading an apache arrow file we got:

Dependent block stream is unsupported (BLOCK_INDEPENDENCE must be set).

Is there any interest in supporting this feature? Our system decompresses columns in parallel so block level parallelism in decompression isn't necessary so my thought is to simply concatenate all blocks and decompress them in one shot.

The work around for this is to use zstd - unfortunately lz4 is the default format for many of these pathways.

The go code manually resizes the dictionary - https://github.com/pierrec/lz4/blob/v4/reader.go#L180.

The java code completely hides the dictionary leading to it being - I think - impossible to do with via simple updates to frameinputstream.

@jpountz - Is it a viable pathway to do a simple update to the java bindings in order to support dependent frames? Another pathway would be to just call the C library directly via FFI bindings.

I was able to (hopefully temporarily) work around this using ffi bindings to the c library. Unfortunately this means users need to ensure liblz4 is available on their system.