HadoopLzoCompressor vs LzoCompressor?
dbtsai opened this issue · 1 comments
We are trying to add LzoCodec to Apache Hadoop based on the implementation of aircompressor. apache/hadoop#2159
When we try to integrate it into Hadoop, we get couple tests failures due to java.lang.UnsupportedOperationException: LZO block compressor is not supported
. We find it's because in LzoCodec in aircompressor, we have a static class HadoopLzoCompressor
that returns dummy implementation when getCompressor
is called. Why don't we return LzoCompressor
instead?
The Hadoop block compressor/decompressor interfaces are not supported, but the streaming interfaces are. The codecs in this project are designed for interacting with datalake file formats, and these either use the streaming interface or in the case of modern formats like ORC, Parquet and AVRO, they directly using compression algorithms, bypassing the Hadoop Codecs. We don't intend to add implementations of the Hadoop block apis, but I expect you could easily build them yourself using the underlying compression implementations in this project. Feel free, to fork/copy into the Hadoop code base.