prevent writing of completely empty files
Opened this issue · 20 comments
Creating a new FITS file with mode rw
and not doing anything with the file results in a corrupt FITS file:
>>> from fitsio import FITS
>>> fits = FITS('test.fits', mode='rw')
>>> fits.close()
>>> fits = FITS('test.fits', mode='rw')
Traceback (most recent call last):
...
This does not look like a FITS file.
You need to write something. To create an empy first HDU, write None
import fitsio
fname = 'test.fits'
with fitsio.FITS(fname, mode='rw', clobber=True) as fits:
fits.write(None)
with fitsio.FITS(fname) as fits:
assert len(fits) > 0
assert not fits[0].has_data()
While writing an empty PHDU at the beginning works around the issue, I still believe that opening a file and not doing anything to the object should either not create the file, or should create a file that is readable (e.g. add the empty PHDU automatically). I found it odd that one can so easily engineer a file using fitsio that fitsio itself then cannot read.
I converted this to a feature request
There is something to discuss about this.
If we implemented this feature request then it would not be possible to update the file at a later date to contain data in the first HDU
There are probably two ways to do deal with this so it makes sens
- Automatically write an empty extension if the user never wrote anything.
- raise an exception if the user never wrote anything
My preference would be 2, but I'm happy to discuss
Is it not feasible to simply unlink the new file again if nothing was written to it? Using the open handle so there is no issue with something else having written over the file in the meantime
That would not match expectations for python file opening; if you use open(fname, 'w')
it will create the file
I think we have to write the file with zero bytes and then on opening either delete it if it is zero length for rw or build a read-only emulation of an empty fits file in the python layers.
In which case I would say option 1 with the empty PHDU written just before closing an empty file. The user can never add data into the first HDU of an invalid file anyway.
We could also raise for raise for read attempts on empty files
The code does raise OSError
on trying to open an empty file with mode read only
I will argue that If you open a file with mode rw
and don't write anything it is a bug
Fair point on a that being a bug.
I guess the question is whether we should or should not match the python vs fits file semantics.
We should decide on that and then make the behavior in the code match.
Note this is underlying cfitsio behavior not fitsio behavior; the same thing happens if you open a file with cfitsio and don't write anything, it leaves an empty file and it can't read it!
In that case I’m happy to close this as wontfix. I still find it odd, but it’s probably not worth it to work around cfitsio quirks.
Ok. I agree then that adding an explicit error that is raised if an empty file is closed, while not pythonic, is the best way to match cfitsio in a user friendly way.
If we'd rather match python semantics, then we'd need to put in special handling for empty files.
If I understand your meaning, I think that what we do now matches python semantics: If you open a file for write, and don't write anything, it just leaves an empty file, like open()
This matches cfitsio too
But this is structured data and that is invalid FITS. So we could take a stance and say "no you can't write an invalid file"
This will be a breaking change for someone out there.
It is not fully python semantics currently IIUIC. Opening an empty file for rw
is valid for a generic python file.
Ack you are right that if we error on write this won't match cfitsio.
As I said, I didn't realise that cfitsio has this same behaviour, so I'm happy with no change here.
If change is desirable, then I think you only need to decide whether opening a FITS for writing and closing it again should be a NOP or create an empty FITS file. It it means the former then you can safely remove empty files, and If it means the latter then it should write the None
extension of an empty FITS file, since a completely empty file does not map to anything in the FITS universe.
detecting the empty file would require a bit of care, since cfitsio buffers.