esheldon/fitsio

prevent writing of completely empty files

Opened this issue · 20 comments

Creating a new FITS file with mode rw and not doing anything with the file results in a corrupt FITS file:

>>> from fitsio import FITS
>>> fits = FITS('test.fits', mode='rw')
>>> fits.close()
>>> fits = FITS('test.fits', mode='rw')
Traceback (most recent call last):
...
This does not look like a FITS file.

You need to write something. To create an empy first HDU, write None

import fitsio

fname = 'test.fits'
with fitsio.FITS(fname, mode='rw', clobber=True) as fits:
    fits.write(None)

with fitsio.FITS(fname) as fits:
    assert len(fits) > 0
    assert not fits[0].has_data()

While writing an empty PHDU at the beginning works around the issue, I still believe that opening a file and not doing anything to the object should either not create the file, or should create a file that is readable (e.g. add the empty PHDU automatically). I found it odd that one can so easily engineer a file using fitsio that fitsio itself then cannot read.

I converted this to a feature request

There is something to discuss about this.

If we implemented this feature request then it would not be possible to update the file at a later date to contain data in the first HDU

There are probably two ways to do deal with this so it makes sens

  1. Automatically write an empty extension if the user never wrote anything.
  2. raise an exception if the user never wrote anything

My preference would be 2, but I'm happy to discuss

Is it not feasible to simply unlink the new file again if nothing was written to it? Using the open handle so there is no issue with something else having written over the file in the meantime

That would not match expectations for python file opening; if you use open(fname, 'w') it will create the file

I think we have to write the file with zero bytes and then on opening either delete it if it is zero length for rw or build a read-only emulation of an empty fits file in the python layers.

In which case I would say option 1 with the empty PHDU written just before closing an empty file. The user can never add data into the first HDU of an invalid file anyway.

We could also raise for raise for read attempts on empty files

The code does raise OSError on trying to open an empty file with mode read only

I will argue that If you open a file with mode rw and don't write anything it is a bug

Fair point on a that being a bug.

I guess the question is whether we should or should not match the python vs fits file semantics.

We should decide on that and then make the behavior in the code match.

Note this is underlying cfitsio behavior not fitsio behavior; the same thing happens if you open a file with cfitsio and don't write anything, it leaves an empty file and it can't read it!

In that case I’m happy to close this as wontfix. I still find it odd, but it’s probably not worth it to work around cfitsio quirks.

Ok. I agree then that adding an explicit error that is raised if an empty file is closed, while not pythonic, is the best way to match cfitsio in a user friendly way.

If we'd rather match python semantics, then we'd need to put in special handling for empty files.

If I understand your meaning, I think that what we do now matches python semantics: If you open a file for write, and don't write anything, it just leaves an empty file, like open()

This matches cfitsio too

But this is structured data and that is invalid FITS. So we could take a stance and say "no you can't write an invalid file"

This will be a breaking change for someone out there.

It is not fully python semantics currently IIUIC. Opening an empty file for rw is valid for a generic python file.

Ack you are right that if we error on write this won't match cfitsio.

As I said, I didn't realise that cfitsio has this same behaviour, so I'm happy with no change here.

If change is desirable, then I think you only need to decide whether opening a FITS for writing and closing it again should be a NOP or create an empty FITS file. It it means the former then you can safely remove empty files, and If it means the latter then it should write the None extension of an empty FITS file, since a completely empty file does not map to anything in the FITS universe.

detecting the empty file would require a bit of care, since cfitsio buffers.