ccpem/mrcfile

Writing/reading extended header seems to be broken in latest version

jmp1985 opened this issue · 5 comments

Hi @colinpalmer

I am having an issue with the latest version of mrcfile on pypi. It looks like there is some issue with the extended header being written.

If I write an mrcfile and then try to read the file back, the extended header does not seem to be interpreted correctly. Instead of an array of records I just get something like "[b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' ... ] as the extended header.

Indeed, it looks like the bug existed before this version. However, previously it only affected files not written using the new_mmap function. I never noticed this because I have been writing mrcfile using new_mmap; however, in the new version of mrcfile the behaviour is consistent but unfortunately consistently broken!

Here is some python code to reproduce the problem:

import mrcfile
import numpy as np

def write():

    N = 10
    header_dtype = mrcfile.dtypes.get_ext_header_dtype(b"FEI1")
    
    data = np.zeros((N, 10, 10), dtype="float16")
    header = np.zeros(N, dtype=header_dtype)

    handle = mrcfile.new_mmap(
        "test_mmap.mrc",
        shape=data.shape,
        mrc_mode=mrcfile.utils.mode_from_dtype(data.dtype),
        overwrite=True,
        extended_header=header,
        exttyp="FEI1",
    )
    
    handle = mrcfile.new("test.mrc", overwrite=True)
    handle.set_data(data)
    handle.set_extended_header(header)



def read():
    
    print("TEST: test_mmap.mrc")
    handle = mrcfile.open("test_mmap.mrc")
    print(handle.extended_header.dtype)
    print(handle.extended_header)
    print("") 
    print("TEST: test.mrc")
    handle = mrcfile.open("test.mrc")
    print(handle.extended_header.dtype)
    print(handle.extended_header)


if __name__ == '__main__':
    write()
    read()

In version v1.4.3 when reading test_mmap.mrc I get the following output:

TEST: test_mmap.mrc
[('Metadata size', '<i4'), ('Metadata version', '<i4'), ('Bitmask 1', '<u4'), ('Timestamp', '<f8'), ('Microscope type', 'S16'), ('D-Number', 'S16'), ('Application', 'S16'), ('Application version', 'S16'), ('HT', '<f8'), ('Dose', '<f8'), ('Alpha tilt', '<f8'), ('Beta tilt', '<f8'), ('X-Stage', '<f8'), ('Y-Stage', '<f8'), ('Z-Stage', '<f8'), ('Tilt axis angle', '<f8'), ('Dual axis rotation', '<f8'), ('Pixel size X', '<f8'), ('Pixel size Y', '<f8'), ('Unused range', 'S48'), ('Defocus', '<f8'), ('STEM Defocus', '<f8'), ('Applied defocus', '<f8'), ('Instrument mode', '<i4'), ('Projection mode', '<i4'), ('Objective lens mode', 'S16'), ('High magnification mode', 'S16'), ('Probe mode', '<i4'), ('EFTEM On', '?'), ('Magnification', '<f8'), ('Bitmask 2', '<u4'), ('Camera length', '<f8'), ('Spot index', '<i4'), ('Illuminated area', '<f8'), ('Intensity', '<f8'), ('Convergence angle', '<f8'), ('Illumination mode', 'S16'), ('Wide convergence angle range', '?'), ('Slit inserted', '?'), ('Slit width', '<f8'), ('Acceleration voltage offset', '<f8'), ('Drift tube voltage', '<f8'), ('Energy shift', '<f8'), ('Shift offset X', '<f8'), ('Shift offset Y', '<f8'), ('Shift X', '<f8'), ('Shift Y', '<f8'), ('Integration time', '<f8'), ('Binning Width', '<i4'), ('Binning Height', '<i4'), ('Camera name', 'S16'), ('Readout area left', '<i4'), ('Readout area top', '<i4'), ('Readout area right', '<i4'), ('Readout area bottom', '<i4'), ('Ceta noise reduction', '?'), ('Ceta frames summed', '<i4'), ('Direct detector electron counting', '?'), ('Direct detector align frames', '?'), ('Camera param reserved 0', '<i4'), ('Camera param reserved 1', '<i4'), ('Camera param reserved 2', '<i4'), ('Camera param reserved 3', '<i4'), ('Bitmask 3', '<u4'), ('Camera param reserved 4', '<i4'), ('Camera param reserved 5', '<i4'), ('Camera param reserved 6', '<i4'), ('Camera param reserved 7', '<i4'), ('Camera param reserved 8', '<i4'), ('Camera param reserved 9', '<i4'), ('Phase Plate', '?'), ('STEM Detector name', 'S16'), ('Gain', '<f8'), ('Offset', '<f8'), ('STEM param reserved 0', '<i4'), ('STEM param reserved 1', '<i4'), ('STEM param reserved 2', '<i4'), ('STEM param reserved 3', '<i4'), ('STEM param reserved 4', '<i4'), ('Dwell time', '<f8'), ('Frame time', '<f8'), ('Scan size left', '<i4'), ('Scan size top', '<i4'), ('Scan size right', '<i4'), ('Scan size bottom', '<i4'), ('Full scan FOV X', '<f8'), ('Full scan FOV Y', '<f8'), ('Element', 'S16'), ('Energy interval lower', '<f8'), ('Energy interval higher', '<f8'), ('Method', '<i4'), ('Is dose fraction', '?'), ('Fraction number', '<i4'), ('Start frame', '<i4'), ('End frame', '<i4'), ('Input stack filename', 'S80'), ('Bitmask 4', '<u4'), ('Alpha tilt min', '<f8'), ('Alpha tilt max', '<f8')]
[(0, 0, 0, 0., b'', b'', b'', b'', 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., b'', 0., 0., 0., 0, 0, b'', b'', 0, False, 0., 0, 0., 0, 0., 0., 0., b'', False, False, 0., 0., 0., 0., 0., 0., 0., 0., 0., 0, 0, b'', 0, 0, 0, 0, False, 0, False, False, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, False, b'', 0., 0., 0, 0, 0, 0, 0, 0., 0., 0, 0, 0, 0, 0., 0., b'', 0., 0., 0, False, 0, 0, 0, b'', 0, 0., 0.)]

TEST: test.mrc
|V1
[b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' ... ] # truncated for brevity

In version 1.5.0 I get this:

TEST: test_mmap.mrc
|V1
[b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' ... ] # truncated for brevity

TEST: test.mrc
|V1
[b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' b'\x00' ... ] # truncated for brevity

Did you see the release notes for v1.5? You might just need to switch your code to use indexed_extended_header instead of extended_header (or support both versions by doing a hasattr check for indexed_extended_header if you need to).

Hi Colin,

Ah, thanks, this could be it. However, if I change to use indexed_extended_header then I get another different error. For example changing my read code above to:

def read():
    
    print("TEST: test_mmap.mrc")
    handle = mrcfile.open("test_mmap.mrc")
    # print(handle.indexed_extended_header.dtype)
    print(handle.indexed_extended_header)
    print("") 
    print("TEST: test.mrc")
    handle = mrcfile.open("test.mrc")
    # print(handle.indexed_extended_header.dtype)
    print(handle.indexed_extended_header)

:
I get the following output:

TEST: test_mmap.mrc
/home/upc86896/Software/env/dev/lib/python3.11/site-packages/mrcfile/mrcobject.py:220: RuntimeWarning: The header has exttyp 'b'FEI1'' but the extended header cannot be interpreted as that type
  warnings.warn("The header has exttyp '{}' but the extended header "
None

TEST: test.mrc
Traceback (most recent call last):
  File "/home/upc86896/Data/2024/Test_MrcFile/test.py", line 42, in <module>
    read()
  File "/home/upc86896/Data/2024/Test_MrcFile/test.py", line 37, in read
    print(handle.indexed_extended_header)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/upc86896/Software/env/dev/lib/python3.11/site-packages/mrcfile/mrcobject.py", line 213, in indexed_extended_header
    if self.extended_header.size < dtype.itemsize:
                                   ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'itemsize'

The behaviour is different for the file created with new_mmap() and new() but both give errors about interpretting the extended header data.

Is it obvious what I might be doing wrong?

The new code that interprets the indexed extended header does some checking to make sure it will work properly, so you need to set some things appropriately. I can make your script work by adding a couple of lines to the write() function, to set the metadata size and exttyp fields:

 def write():
 
     N = 10
     header_dtype = mrcfile.dtypes.get_ext_header_dtype(b"FEI1")
     
     data = np.zeros((N, 10, 10), dtype="float16")
     header = np.zeros(N, dtype=header_dtype)
+    header["Metadata size"] = header_dtype.itemsize
 
     handle = mrcfile.new_mmap(
         "test_mmap.mrc",
         shape=data.shape,
         mrc_mode=mrcfile.utils.mode_from_dtype(data.dtype),
         overwrite=True,
         extended_header=header,
         exttyp="FEI1",
     )
     
     handle = mrcfile.new("test.mrc", overwrite=True)
     handle.set_data(data)
     handle.set_extended_header(header)
+    handle.header.exttyp = b'FEI1'

Hi Colin

Thanks for this! This fixes the issue and my code now works again!

However, the new behaviour is somewhat more complicated than the old behaviour. Would it be possible to simplify a bit? Perhaps it would be possible to set the exttype automatically based on the input of set_extended_header() and indeed the new_mmap constructor. Not sure what to do about the "Metadata size" bit though.

the new behaviour is somewhat more complicated than the old behaviour

I'm not sure I agree with this. As you found in the old version, you could set an extended header but if you didn't set the exttyp to match it wouldn't load it back in properly. And mrcfile might have read an FEI1 header ok without "Metadata size" set but there's a good chance that other software might fail to read the same file.

Would it be possible to simplify a bit? Perhaps it would be possible to set the exttype automatically based on the input of set_extended_header() and indeed the new_mmap constructor. Not sure what to do about the "Metadata size" bit though.

We could probably add a helper function to create a new FEI-style extended header, which could make the array with the right dtype and set the "Metadata size" (and maybe any other fields that should be set to make it compliant with the TFS spec). It might be a little awkward to get it working nicely for both normal and mmap files, but I expect it's doable. Realistically I won't have time to get to this any time soon though. Feel free to submit a PR if you like 😉