astropy/asdf-astropy

setting compression on individual columns of astropy table

Opened this issue · 2 comments

Is there a syntax to set compression on individual columns of an astropy table? In the following example, using all_array_compression compresses the columns, but using AsdfFile.set_array_compression() does not.

import numpy as np
from astropy.table import Table
import asdf

t = Table(data=dict(col=np.ones(1)))

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.write_to('test.asdf')

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.write_to('test_compressed.asdf',
        all_array_compression='zlib',
    )

with asdf.AsdfFile(tree=dict(table=t)) as af:
    af.set_array_compression(af['table']['col'], 'zlib')  # this has no effect
    af.write_to('test_compressed_col.asdf')

Comparing test.asdf with test_compressed_col.asdf, we see that they have identical checksums (and there's no zlib tag at the beginning of the binary block). So the set_array_compression had no effect.

I've tried variants of this like

set_array_compression(af['table']['col'].base, 'zlib')
set_array_compression(af['table']['col'].data, 'zlib')

but I couldn't get it to work.

I did dig around in the source code a bit, and it looked to me like it's trying to compare the ultimate ndarray base to check if two arrays are the same, but maybe a copy is being made somewhere that's thwarting this detection.

@braingram or @eslavich what are your opinions?

Thanks for opening this issue!

Unfortunately this looks to be unsupported without using the legacy extension api (specifically the 'reserve_blocks' hook).

The call to set_array_compression uses the id of the column array to define an internal block which stores the zlib compression option. However, the call to write_to includes a call to block_manager.find_used_blocks which looks at all internal blocks (like the one created on the call to set_array_compression) and throws out any blocks that don't appear to be used.
https://github.com/asdf-format/asdf/blob/master/asdf/block.py#L553-L557
It uses the reserve_blocks hook that is currently only supported with legacy extensions (asdf-astropy uses the new style converters) and looks at each node in the tree to see if that node has blocks that should be kept. Since no node claims the block created when set_array_compression was called (in this case the table node should claim this but ASDF does not currently have a way to do this) it is thrown out and the compression settings are lost.

We (the asdf developers) are currently working on flushing out the new style extensions to support all the features of the legacy extension api/type system. This is a good example case that we should strive to support.