ccpem/mrcfile

Gemmi vs mrcfile - values of some voxels are not equal

Closed this issue · 3 comments

Dear all,

First of all, thank you for developing this package, it really comes in handy while reading big maps.

Recently, I've noticed that values of some voxels provided by gemmi and mrcfile are not equal while reading the same map.
Could you think of why it can happen?

Map file: EMD-1832.zip

from pathlib import Path
from typing import Tuple
import gemmi
import mrcfile
import numpy as np


def get_np_arr_from_gemmi(map_path: Path) -> np.ndarray:
    path = str(map_path.resolve())
    m = gemmi.read_ccp4_map(path)
    arr = np.array(m.grid)
    print(f'gemmi arr: dtype={arr.dtype}')
    return arr

def get_np_arr_from_mrcfile(map_path: Path) -> np.ndarray:
    path = str(map_path.resolve())
    m = mrcfile.open(path, mode='r')
    arr = m.data
    print(f'mrcfile arr: dtype={arr.dtype}')
    return arr

def compare_two_arrays(arr1, arr2):
    mask = arr1 == arr2
    reversed_mask = np.invert(mask)
    print(f'Not equal indices: {reversed_mask.nonzero()}')

def get_value_from_two_arrs(arr1, arr2, index: Tuple[int,int,int]):
    val_1 = arr1[index[0], index[1], index[2]]
    val_2 = arr2[index[0], index[1], index[2]]
    return val_1, val_2

if __name__ == '__main__':
    MAP_FILEPATH = Path('EMD-1832.map')
    arr1 = get_np_arr_from_gemmi(MAP_FILEPATH)
    arr2 = get_np_arr_from_mrcfile(MAP_FILEPATH)
    compare_two_arrays(arr1, arr2)
    # try to get value of (62, 31, 31) voxel
    indices = (62, 31, 31)
    gemmi_val, mrcfile_val = get_value_from_two_arrs(arr1, arr2, indices)
    print(f'indices: {indices}')
    print(f'gemmi_val {gemmi_val}')
    print(f'mrcfile_val {mrcfile_val}')

    indices = (1, 31, 32)
    gemmi_val, mrcfile_val = get_value_from_two_arrs(arr1, arr2, indices)
    print(f'indices: {indices}')
    print(f'gemmi_val {gemmi_val}')
    print(f'mrcfile_val {mrcfile_val}')

Output:

gemmi arr: dtype=float32
mrcfile arr: dtype=float32
Not equal indices: (array([ 1,  2,  2, ..., 61, 61, 62]), array([31, 24, 24, ..., 38, 38, 31]), array([32, 29, 30, ..., 33, 34, 31]))
indices: (62, 31, 31)
gemmi_val 0.09139659255743027
mrcfile_val 0.0
indices: (1, 31, 32)
gemmi_val 0.0

Best regards,
Aliaksei

Thanks for reporting this, I'll look into it. I see you raised the same issue for Gemmi as well so let's have a link so we can keep track of them both together: project-gemmi/gemmi#210

I've had a look at this and identified the cause of the difference. Gemmi (in common with much other crystallographic software) reads data into a grid using the FORTRAN axis convention: the fastest axis corresponds to the first index and the slowest to the last, so when you index into it the coordinates are interpreted as arr[x, y, z]. mrcfile (in common with Python and various other Python packages) instead uses the C axis convention which is the opposite, so coordinates are interpreted as arr[z, y, x].

The code you presented can be fixed by simply transposing one of the arrays, for example in get_np_arr_from_gemmi() you can replace arr = np.array(m.grid) with arr = np.array(m.grid).transpose() and then we find there are no differences between the Gemmi and mrcfile arrays.

The Gemmi documentation is clear that the data grid is Fortran-style contiguous. The mrcfile documentation implicitly shows that the arrays are C-style contiguous through the examples but it is not spelt out explicitly, so I will add a note to that effect.

In general, you can check the axis order in a numpy array using its flags, for example with your code as given above, I can add the following lines:

     arr1 = get_np_arr_from_gemmi(MAP_FILEPATH)
     arr2 = get_np_arr_from_mrcfile(MAP_FILEPATH)
+    print(f'arr1: f_contiguous={arr1.flags.f_contiguous} c_contiguous={arr1.flags.c_contiguous}')
+    print(f'arr2: f_contiguous={arr2.flags.f_contiguous} c_contiguous={arr2.flags.c_contiguous}')
     compare_two_arrays(arr1, arr2)

to get the following output:

arr1: f_contiguous=True c_contiguous=False
arr2: f_contiguous=False c_contiguous=True

Finally, just to note that as discussed in #23, mrcfile does not currently swap the axes in response to the header's mapc, mapr and maps fields. That might well change in future, which could affect the array indexing – depending on the implementation, it could mean arrays read by mrcfile would not necessarily be C-contiguous. I expect (but I'm not sure) that Gemmi probably already handles this kind of axis swapping.

Thank you for the clarification and detailed response. I highly appreciate it.