setting compression on individual columns of astropy table
Opened this issue · 2 comments
Is there a syntax to set compression on individual columns of an astropy table? In the following example, using all_array_compression
compresses the columns, but using AsdfFile.set_array_compression()
does not.
import numpy as np
from astropy.table import Table
import asdf
t = Table(data=dict(col=np.ones(1)))
with asdf.AsdfFile(tree=dict(table=t)) as af:
af.write_to('test.asdf')
with asdf.AsdfFile(tree=dict(table=t)) as af:
af.write_to('test_compressed.asdf',
all_array_compression='zlib',
)
with asdf.AsdfFile(tree=dict(table=t)) as af:
af.set_array_compression(af['table']['col'], 'zlib') # this has no effect
af.write_to('test_compressed_col.asdf')
Comparing test.asdf
with test_compressed_col.asdf
, we see that they have identical checksums (and there's no zlib tag at the beginning of the binary block). So the set_array_compression
had no effect.
I've tried variants of this like
set_array_compression(af['table']['col'].base, 'zlib')
set_array_compression(af['table']['col'].data, 'zlib')
but I couldn't get it to work.
I did dig around in the source code a bit, and it looked to me like it's trying to compare the ultimate ndarray base
to check if two arrays are the same, but maybe a copy is being made somewhere that's thwarting this detection.
@braingram or @eslavich what are your opinions?
Thanks for opening this issue!
Unfortunately this looks to be unsupported without using the legacy extension api (specifically the 'reserve_blocks' hook).
The call to set_array_compression uses the id
of the column array to define an internal block which stores the zlib compression option. However, the call to write_to
includes a call to block_manager.find_used_blocks
which looks at all internal blocks (like the one created on the call to set_array_compression) and throws out any blocks that don't appear to be used.
https://github.com/asdf-format/asdf/blob/master/asdf/block.py#L553-L557
It uses the reserve_blocks
hook that is currently only supported with legacy extensions (asdf-astropy uses the new style converters) and looks at each node in the tree to see if that node has blocks that should be kept. Since no node claims the block created when set_array_compression was called (in this case the table node should claim this but ASDF does not currently have a way to do this) it is thrown out and the compression settings are lost.
We (the asdf developers) are currently working on flushing out the new style extensions to support all the features of the legacy extension api/type system. This is a good example case that we should strive to support.