asdf-format/asdf

Can't open ASDF in FITS

Opened this issue · 20 comments

I upgraded asdf and can no longer read files. Example:

$ /blue/adamginsburg/adamginsburg/miniconda3/envs/python39/bin/ipython -c "import asdf; print(asdf.__version__); asdf.open('F200W/pipeline/jw01182004001_04101_00001_nrca1_cal.fits')"
Logging to /blue/adamginsburg/adamginsburg/jwst/brick/ipython_log_2023-11-13.py
[TerminalIPythonApp] WARNING | Config option `ignore_old_config` not recognized by `TerminalIPythonApp`.
Activating auto-logging. Current session state plus future input saved.
Filename       : /blue/adamginsburg/adamginsburg/jwst/brick/ipython_log_2023-11-13.py
Mode           : append
Output logging : False
Raw input log  : False
Timestamping   : False
State          : active
/blue/adamginsburg/adamginsburg/.ipython/profile_default/startup/05_log.py:15: DeprecationWarning: `magic(...)` is deprecated since IPython 0.13 (warning added in 8.1), use run_line_magic(magic_name, parameter_s).
  ip.magic('logstart -o %s append' % filename)
/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/IPython/core/magics/logging.py:130: UserWarning: Couldn't start log: Log file is already active: /blue/adamginsburg/adamginsburg/jwst/brick/ipython_log_2023-11-13.py
  warn("Couldn't start log: %s" % sys.exc_info()[1])
 Logging to /blue/adamginsburg/adamginsburg/jwst/brick/ipython_log_2023-11-13.py
3.0.1
/orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/extension/_converter.py:186: AsdfWarning: Converter handles multiple tags for this extension, but does not implement a select_tag method. This previously worked because Converter subclasses inherited the now removed select_tag. This will be an error in a future version of asdf
  warnings.warn(msg, AsdfWarning)
Traceback (most recent call last):
  Cell In[1], line 1
    import asdf; print(asdf.__version__); asdf.open('F200W/pipeline/jw01182004001_04101_00001_nrca1_cal.fits')
  File /orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py:1584 in open_asdf
    return AsdfFile._open_impl(
  File /orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py:881 in _open_impl
    return cls._open_asdf(
  File /orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py:791 in _open_asdf
    self._file_format_version = cls._parse_header_line(header_line)
  File /orange/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py:709 in _parse_header_line
    raise ValueError(msg)
ValueError: Does not appear to be a ASDF file.

This worked on earlier versions:

>>> import asdf
>>> print(asdf.__version__)
>>> asdf.open('F356W/pipeline/jw01182004001_02101_00001_nrcalong_cal.fits')
2.15.1
<asdf.fits_embed.AsdfInFits at 0x14c75ec01130>

I see that this is intentional: #1288 but it broke my workflows and was a very surprising error. I'd like to request a clearer error message and guide to migration.

Thanks for opening the issue. It appears you're using a JWST file. Do you have stdatamodels or the jwst package installed in your environment?

As you've already found AsdfInFits is no longer supported. The deprecation of this feature in 2.15 included a warning and a description about how to migrate to stdatamodels. This deprecation (and the warning) was removed in 3.0 (along with removal of AsdfInFits). The docs still contain an explanation of the migration: https://asdf.readthedocs.io/en/latest/asdf/deprecations.html#asdf-in-fits-deprecation
with a link to the stdatatmodels documentation for equivalent functions:
https://stdatamodels.readthedocs.io/en/latest/asdf_in_fits.html

Since this appears to be a jwst file, you might consider opening it with the jwst datamodels api:
https://stdatamodels.readthedocs.io/en/latest/jwst/datamodels/models.html#opening-a-file

That's interesting; the deprecation warning did not appear in 2.15.1 - you can see the full output I received above. I've verified that there is no warning in a fresh session too:

$ /blue/adamginsburg/adamginsburg/miniconda3/envs/python39/bin/python -c "import asdf; print(asdf.__version__); asdf.open('F200W/pipeline/jw01182004001_04101_00001_nrca1_cal.fits')"
2.15.2

I'll look more at datamodels. It's unclear to me whether I can write datamodels back out, though. Maybe that's instead handled in https://stdatamodels.readthedocs.io/en/latest/asdf_in_fits.html.

Thanks for sharing the example. Would you try it with -X dev to trigger python development mode (to show deprecation warnings)? In hindsight we should have made this a more obvious warning to catch uses like yours.

The datamodels api does include a save method. This is the api used throughout the jwst pipeline but it is quite different from the more simplified AsdfInFits. For your uses the asdf_in_fits api you linked might also work (and might be easier to use) and is a much closer match to the previous AsdfInFits api.

Please let me know if I can help and sorry for any disruption this caused. We hoped to make the changes in a way that offered advance notice but now I think we should have used a more prominent warning.

Thanks again for opening this. I pinned this issue so that hopefully other folks that encounter this issue will see this discussion.

One jarring difference between asdf and asdf_in_fits is that asdf.write_to(..., overwrite=True) was required to overwrite an existing file, while asdf_in_fits does not accept the overwrite keyword.

Hmmm, I'm not sure I'm testing the same thing you are as I'm unable to replicate this locally.

Both AsdfInFits.write_to and stdatamodels.asdf_in_fits.write pass **kwargs to astropy.io.fits.HDUList.write_to

If with asdf 2.15.2 and stdatamodels 1.4.0 I create an AsdfInFits instance and save it I am required to pass overwrite=True the second time

import asdf.fits_embed, stdatamodels.asdf_in_fits

af = asdf.fits_embed.AsdfInFits()
af.write_to('foo.fits')
af.write_to('foo.fits')  # OSError
af.write_to('foo.fits', overwrite=True)  # no error

af = asdf.AsdfFile()
stdatamodels.asdf_in_fits.write('bar.fits', af.tree)
stdatamodels.asdf_in_fits.write('bar.fits', af.tree)  # OSError
stdatamodels.asdf_in_fits.write('bar.fits', af.tree, overwrite=True)  # no error

Would you share an example of the issue you ran into showing which function wasn't accepting overwrite?

OK, I have a few competing problems that all stem from switching from asdf -> stdatamodels.

/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/bin/python -c "from stdatamodels import asdf_in_fits as asdf; fa = asdf.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits'); fa.write_to('test.fits'); fa.write_to('test.fits')"

that works, no errors (it shouldn't; I overwrite test.fits)

$ /blue/adamginsburg/adamginsburg/miniconda3/envs/python39/bin/python -c "from stdatamodels import asdf_in_fits as asdf; fa = asdf.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits'); fa.write_to('test.fits', overwrite=True);"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py", line 1427, in write_to
    _handle_deprecated_kwargs(config, kwargs)
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py", line 1831, in _handle_deprecated_kwargs
    raise TypeError(msg)
TypeError: Unexpected keyword argument 'overwrite'

but also, I'm finding write_to is writing invalid files. More to come, maybe.

To clarify that a bit, I am using 2.15.2 because I reverted asdf.

/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/bin/python -c "import asdf; print(asdf.__version__); import stdatamodels; print(stdatamodels.__version__); from stdatamodels import asdf_in_fits; fa = asdf_in_fits.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits'); fa.write_to('test.fits', overwrite=True);"
2.15.2
1.8.3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py", line 1427, in write_to
    _handle_deprecated_kwargs(config, kwargs)
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python39/lib/python3.9/site-packages/asdf/asdf.py", line 1831, in _handle_deprecated_kwargs
    raise TypeError(msg)
TypeError: Unexpected keyword argument 'overwrite'

Thanks for sharing the examples. stdatamodels.asdf_in_fits is not a drop-in for AsdfInFits or asdf which explains the errors and file issues you're seeing.

From the asdf_in_fits docs open returns an asdf.AsdfFile instance (note this is not an asdf.fits_embed.AsdfInFits). For your example:

from stdatamodels import asdf_in_fits as asdf
fa = asdf.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
fa.write_to('test.fits')
fa.write_to('test.fits')

fa is an asdf.AsdfFile instance so calling write_to('test.fits') writes an ASDF file to test.fits (note that even though this has a fits file extension it is not a FITS file. Instead you likely want to use something like:

from stdatamodels import asdf_in_fits  # let's leave this as asdf_in_fits to not confuse this with asdf
fa = asdf_in_fits.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
# here fa is an asdf.AsdfFile instance, if we want to write this to a fits file we need to use asdf_in_fits
asdf_in_fits.write('foo.fits', fa.tree)

The TypeError you shared is due to passing an overwrite argument to AsdfFile.write_to which doesn't support (or need) that argument.

Thanks, that makes some sense. But what I'm gathering is that there's not a drop-in replacement for AsdfInFits, so I have to completely rethink my code if I upgrade to asdf 3.0. I don't understand the asdf data model enough to parse these instructions:
https://stdatamodels.readthedocs.io/en/latest/asdf_in_fits.html

What I'm trying to do, in asdf 2.15.2 language, is this:

af = asdf.fits_embed.AsdfInFits.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
af.write_to('test.fits', overwrite=True)

I think I have to do something like:

from stdatamodels import asdf_in_fits  # let's leave this as asdf_in_fits to not confuse this with asdf
from astropy.io import fits
filename = 'F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits'
fa = asdf_in_fits.open(filename)
fh = fits.open(filename)
asdf_in_fits.write(filename='foo.fits', tree=fa.tree, hdulist=fh)

but this makes me uncomfortable on many levels:

  • it's no longer writing out an object with known properties, instead using a function to compose a new file
  • it is unclear what I should expect the order of HDUs to be in the resulting file
  • the pattern requires more steps and is not similar to the old pattern

The example you shared for updating your code to use asdf_in_fits looks ok but could be simpler (supplying an hdulist is optional) . Did this example work for you?

from stdatamodels import asdf_in_fits
fa = asdf_in_fits.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
asdf_in_fits.write('foo.fits', fa.tree)

This is very similar to the asdf 2.15 example you shared. The biggest change is that write is not a function available from the asdf_in_fits module instead of write_to being a method on an AsdfInFits instance. This change was necessary as there is no longer any AsdfInFits class. I don't quite understand your comment that 'it's no longer writing out an object with known properties'. It doesn't matter if this is a function in a module or a method on a class, both produce equivalent files. As far as I'm aware neither AsdfInFits nor asdf_in_fits makes any guarantee about the order of HDUs (outside of the ASDF hdu appearing last).

OK, thanks. It wasn't obvious to me that asdf_in_fits.open preserved all of the FITS HDUs - I had (incorrectly) assumed that it was extracting the ASDF HDU from the file and therefore that the FITS HDUs had to be manually added back in.

asdf_in_fits.open will convert the fits HDUs referenced in the asdf tree to arrays when constructing the AsdfFile instance. I expect this is not what you're hoping for given your last comment. For example, taking an example jwst file, if I inspect the structure (using HDUList.info) I get the following:

>> ff = astropy.io.fits.open('jw01024001001_04101_00001_mirifulong_rate.fits')
>> ff.info()
Filename: jw01024001001_04101_00001_mirifulong_rate.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     234   ()
  1  SCI           1 ImageHDU        60   (1032, 1024)   float32
  2  ERR           1 ImageHDU        10   (1032, 1024)   float32
  3  DQ            1 ImageHDU        11   (1032, 1024)   int32 (rescales to uint32)
  4  VAR_POISSON    1 ImageHDU         9   (1032, 1024)   float32
  5  VAR_RNOISE    1 ImageHDU         9   (1032, 1024)   float32
  6  ASDF          1 BinTableHDU     11   1R x 1C   [7076B]

If I open this file with asdf_in_fits.open then write it with asdf_in_fits.write (without mapping the data arrays to HDUs) asdf_in_fits will not automatically create HDUs for the data arrays as it does not know where any given array should be written.

>>  af = stdatamodels.asdf_in_fits.open('jw01024001001_04101_00001_mirifulong_rate.fits')
>> stdatamodels.asdf_in_fits.write('foo.fits', af.tree)
>> ff = astropy.io.fits.open('foo.fits')
>> ff.info()
Filename: foo.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU       4   ()
  1  ASDF          1 BinTableHDU     11   1R x 1C   [21142735B]

As this is a jwst file there are many benefits to using the stdatamodels.jwst.datamodels interface (including mapping data to HDUs and tree meta data to fits keywords). Using the same example file above:

>> import stdatamodels.jwst.datamodels
>> m = stdatamodels.jwst.datamodels('jw01024001001_04101_00001_mirifulong_rate.fits')
>> m.save('foo.fits')
>> ff = astropy.io.fits.open('foo.fits')
>> ff.info()
Filename: foo.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     234   ()
  1  SCI           1 ImageHDU        60   (1032, 1024)   float32
  2  ERR           1 ImageHDU        10   (1032, 1024)   float32
  3  DQ            1 ImageHDU        11   (1032, 1024)   int32 (rescales to uint32)
  4  VAR_POISSON    1 ImageHDU         9   (1032, 1024)   float32
  5  VAR_RNOISE    1 ImageHDU         9   (1032, 1024)   float32
  6  ASDF          1 BinTableHDU     11   1R x 1C   [7038B]

Where is stdatamodels.jwst documented? I get a null result here:
https://stdatamodels.readthedocs.io/en/latest/search.html?q=jwst&check_keywords=yes&area=default
I guess we have to go to the JWST models?
https://jwst-pipeline.readthedocs.io/en/latest/jwst/user_documentation/datamodels.html

I'll investigate, but I need access to the ASDF and to the FITS objects, which seems to take quite some digging to find in the datamodels.

I can't make much sense of the JWST data models. They're documented:
https://stdatamodels.readthedocs.io/en/latest/jwst/datamodels/index.html
but the fundamental object I'm actually working with is the GWCS, which I can't find in the data model but can easily find in the ASDF.

af = asdf.fits_embed.AsdfInFits.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
af.tree['meta']['wcs']

vs

af = stdatamodels.jwst.datamodels.open('F444W/pipeline/jw01182004001_04101_00007_nrcalong_destreak.fits')
af.info() # I can see some features here, like meta:
meta = af['meta']
dir(meta) # shows me that there is a wcsinfo attribute
meta.wcsinfo

that meta.wcsinfo isn't a GWCS instance, as it is for the ASDF. I don't see WCS referenced anywhere in the JWST datamodel (except spectral wcs, which is not relevant here).

Would you open an issue over at stdatamodels? https://github.com/spacetelescope/stdatamodels/issues

I'd like to leave this issue open in case other folks run into similar errors.

Huh, my approach for searching for keywords failed. There is a meta.wcs attribute, and that's the thing to modify, following:
https://github.com/spacetelescope/jwst/blob/9dfaa37241e86e8feaa264de083ad68c85bcac08/jwst/scripts/adjust_wcs.py#L250

@keflavich it would helpful to us to know if you using JWST data or not. The datamodels solution is likely not very useful for non-JWST data. And it is on our todo list to add a more generic way of dealing with ASDF in FITS. One question about that: do you want array data within the ASDF stored automatically in FITS extensions or not?

I am working with JWST data, and I have no strong opinions: this is my first time working with GWCS and ASDF, and I'm only touching the ASDF parts because I need to modify them at a point along the JWST pipeline. My use case is therefore quite limited, and I was seeking the most expedient way to make those adjustments. I suspect my problems above all stem from copying code from someone/somewhere else that should have used datamodels under the hood but instead used low-level asdf access, which forced me to change things after an update.

That said, I'd really like there to be symmetry between asdf io and fits io when possible. e.g., I should be able to trust that the HDU order stays the same if I read in/write out HDUs, the write methods should have the same generic conventions (that are now common across astropy?), etc.