HadoopLzoCompressor vs LzoCompressor?

Question

HadoopLzoCompressor vs LzoCompressor?

dbtsai opened this issue 4 years ago · 1 comments

We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. apache/hadoop#2159

When we try to integrate it into Hadoop, we get couple tests failures due to java.lang.UnsupportedOperationException: LZO block compressor is not supported. We find it's because in LzoCodec in aircompressor, we have a static class HadoopLzoCompressor that returns dummy implementation when getCompressor is called. Why don't we return LzoCompressor instead?

Answer 1 · 2023-03-03T20:45:58.000Z

The Hadoop block compressor/decompressor interfaces are not supported, but the streaming interfaces are. The codecs in this project are designed for interacting with datalake file formats, and these either use the streaming interface or in the case of modern formats like ORC, Parquet and AVRO, they directly using compression algorithms, bypassing the Hadoop Codecs. We don't intend to add implementations of the Hadoop block apis, but I expect you could easily build them yourself using the underlying compression implementations in this project. Feel free, to fork/copy into the Hadoop code base.