jorgecarleitao/arrow2

Specify compression per column instead of globally

Opened this issue · 0 comments

Maybe a similar api to how we pass encodings into RowGroupIterator.

This will allow to have different compression config for different columns. It would be very useful in cases where we have a sizeable column with random binary data like hash etc. Or if we are using rle/dictionary encoding, there might not be much point in compressing/decompressing.

This would give significant performance boost for my use case since when I look at timings for querying parquet, it shows 1/4. 1/2 of time is spent decompressing

I would like to work on this if I can get how I should modify the public api for this