DC-analysis/dclab

Bug: Events have principal inertia ratio <1

Closed this issue · 19 comments

dclab version 0.48.1
Windows 10
python 3.9.0

When plotting the principal inertia ratio for a dataset in Shape-Out 2, I noticed values below 1, which should not be there, since the principal inertia ratio calculates the ratio of major over minor axis and should be per definition invariant to rotation.

The selected event looks normal and is one of the events with principal inertia axis < 1

grafik

The algorithm for computing the principal inertia ratio did not change recently.
So this bug probably has been there since the start.

The correlation with the position along the channel axis is interesting.

I assume the principal inertia ratio is here computed on-the-fly by dclab, correct?

How can I check if the principal inertia ratio is computed on the fly?
If I check the file with h5py, like so

h = h5py.File(some_rtdc_file.rtdc)
h["events"].keys()
>>> <KeysViewHDF5 ['area_msd', 'area_ratio', 'area_um', 'aspect', 'bright_avg', 'bright_sd', 'circ', 'contour', 'corr_deform', 'deform', 'fl1_area', 'fl1_max', 'fl1_pos', 'fl1_width', 'fl2_area', 'fl2_max', 'fl2_pos', 'fl2_width', 'fl3_area', 'fl3_max', 'fl3_pos', 'fl3_width', 'frame', 'image', 'index', 'index_online', 'inert_ratio_cvx', 'inert_ratio_raw', 'mask', 'pos_x', 'pos_y', 'size_x', 'size_y', 'temp', 'time', 'trace', 'volume']>

I don't see inert_ratio_prnc there.

The correlation with the position along the channel axis is interesting indeed. Also visible in other datasets:

grafik

grafik

I wonder, is the image width here longer than 256? Maybe this is some kind of integer overflow...

Do you have an example dataset I can take a look at? This does not appear to happen for normal datasets with short image widths.

Yes, the region of interest is quite long. I attached an example file with 300 events.
example_file_dclab_issue_212.zip

I don't think the position matters:
I created a new file only with events with principal inertia ratio <1, then modified the file with h5py basically subtracting 200 from all position_x values and then checked the principal inertia ratio again. The plot looks basically the same.

Code:

import dclab
import h5py

ds_filtered = dclab.new_dataset("input_data.rtdc")
ds_filtered.config["filtering"]["inert_ratio_prnc min"] = 0
ds_filtered.config["filtering"]["inert_ratio_prnc max"] = 1
ds_filtered.apply_filter()
features = ds_filtered.features
if "inert_ratio_prnc" in features:
    features.remove('inert_ratio_prnc')
ds_filtered.export.hdf5(path='events_false_principal_inertia_ratio.rtdc', features=features, filtered=True, override=True)

with h5py.File('events_false_principal_inertia_ratio.rtdc', mode="r+") as h:
    for i in range(len(h['events']['pos_x'])):
        h['events']['pos_x'][i] = h['events']['pos_x'][i] - 200

ds_reduced_position = dclab.new_dataset("events_false_principal_inertia_ratio.rtdc")
plt.figure()
plt.plot(ds_reduced_position["pos_x"], ds_reduced_position["inert_ratio_prnc"], ".")
plt.ylim([0, 1])

Before:
before_modification_of_position

After:
after_modification_of_position

I compared the central moments as computed by dclab with the ones computed by cv2. I used the following code for calculation and plotting. @SaraKaliman implemented the calculation of tilt angle and central moments in principal axes:

import dclab
import pandas as pd
import cv2
import numpy as np
import matplotlib.pyplot as plt


ds = dclab.new_dataset(r"example_file_dclab_issue_212.rtdc")
contours = ds["contour"][:]

features_names = ["inert_ratio_prnc_new", "tilt_new_raw",
                  "mu11_new", "mu02_new", "mu20_new", "mu11_dclab",
                  "mu02_dclab", "mu20_dclab"]

new_features = pd.DataFrame()

for n in range(len(contours)):
    cont_raw = contours[n]

    # moments up to 3rd order
    mu_raw = cv2.moments(cont_raw)
    mu_raw_dclab = dclab.features.inert_ratio.cont_moments_cv(cont_raw)

    # calculate moments on raw contour
    Ixx_raw = np.float64(mu_raw["mu02"])
    Iyy_raw = np.float64(mu_raw["mu20"])
    Ixy_raw = np.float64(mu_raw["mu11"])

    # tilt angle in radians
    angle_fi_raw = 0.5 * np.arctan2(-2 * Ixy_raw, Iyy_raw - Ixx_raw)

    # Moments of Inertia / Central moments in principal axes
    I1_raw = 0.5 * (Ixx_raw + Iyy_raw + np.sqrt(
        (Ixx_raw - Iyy_raw) ** 2 + 4 * (Ixy_raw ** 2)))
    I2_raw = 0.5 * (Ixx_raw + Iyy_raw - np.sqrt(
        (Ixx_raw - Iyy_raw) ** 2 + 4 * (Ixy_raw ** 2)))

    cell_features = np.array([[np.sqrt(I1_raw / I2_raw),
                               angle_fi_raw,
                               mu_raw["mu11"],
                               mu_raw["mu02"],
                               mu_raw["mu20"],
                               mu_raw_dclab["mu11"],
                               mu_raw_dclab["mu02"],
                               mu_raw_dclab["mu20"]]])
    cell_features = pd.DataFrame(cell_features, columns=features_names)

    if n == 0:
        new_features = cell_features
    else:
        new_features = pd.concat([new_features, cell_features], ignore_index=True)

df = pd.DataFrame()
for feature in ds.features:
    if feature == "trace":
        pass
    else:
        df[feature] = ds[feature][:]
for feature in features_names:
    df[feature] = new_features[feature]

# inert_ratio_prnc_new
plt.figure()
plt.plot(df["pos_x"], df["inert_ratio_prnc"], "s", label="dclab")
plt.plot(df["pos_x"], df["inert_ratio_prnc_new"], ".", label="new algorithm")
plt.xlabel("Position in channel (µm)")
plt.ylabel("principal inertia ratio")
plt.title("Principal inertia ratio - comparison dclab vs. new algorithm")
plt.legend()
plt.savefig("inert_ratio_prnc_dclab_vs_new_alogrithm.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["tilt"], "s", label="dclab")
plt.plot(df["pos_x"], df["tilt_new_raw"], ".", label="new algorithm")
plt.xlabel("Position in channel (µm)")
plt.ylabel("tilt")
plt.title("tilt - comparison dclab vs. new algorithm")
plt.legend()
plt.savefig("tilt_dclab_vs_new_alogrithm.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu11_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu11_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu11")
plt.title("Moment mu11 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu11_dclab_vs_cv2.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu02_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu02_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu02")
plt.title("Moment mu02 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu02_dclab_vs_cv2.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu20_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu20_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu20")
plt.title("Moment mu20 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu20_dclab_vs_cv2.png")
plt.show()

These are the results:
mu11_dclab_vs_cv2

mu02_dclab_vs_cv2

mu20_dclab_vs_cv2

--> It seems that the implementation in dclab is different to the one in openCV/cv2. I could imagine that is the reason why the tilt and principal inertia ratio are incorrect in dclab:

inert_ratio_prnc_dclab_vs_new_alogrithm

tilt_dclab_vs_new_alogrithm

I will assume dclab and dcevent have the same code for prnc calculation:

orient = 0.5 * np.arctan2(2 * mu_raw['mu11'],
                              mu_raw['mu02'] - mu_raw['mu20'])
    cc = np.array(cont_raw, dtype=np.float32, copy=True)  # float32 [sic]
    rho = np.sqrt(cc[:, 0] ** 2 + cc[:, 1] ** 2)
    phi = np.arctan2(cc[:, 1], cc[:, 0]) + orient + np.pi / 2
    cc[:, 0] = rho * np.cos(phi)
    cc[:, 1] = rho * np.sin(phi)
    # compute inertia ratio of rotated contour
    mprnc = cv2.moments(cc)
    root_prnc = mprnc["mu20"] / mprnc["mu02"]

The first problem is that orientation angle is calculated wrongly because arctan2 function should take

np.arctan2(2 * -mu_raw['mu11'], mu_raw['mu20'] - mu_raw['mu02'])

The nominator and denominator have wrong sign which than makes arctan2 place the angle in the wrong quartal.
This error was present in the paper with Maik Herbig and probably propagates since then.

Secondly, principal components of the moment of inertia can be easily calculated by diagonalising the central moments matrix. Therefore, there is no need to rotate whole contour and recalculate the moments.
In dcevent:
I will implement new version of the tilt and prnc calculation in dcevent together with whole new set of features that we can be added to the analysis. I plan to have this ready by the time we finish the new segmentation model so that both are implemented at the same time.
In dclab:
When it comes to correcting this bug in dclab, simply replacing

np.arctan2(2 * mu_raw['mu11'], mu_raw['mu02'] - mu_raw['mu20'])

with

np.arctan2(2 * -mu_raw['mu11'], mu_raw['mu20'] - mu_raw['mu02'])

should correct the problem.

Thanks for the thorough investigation ❤️

Ar you planning to create a PR to fix this or should I take over from here?

I think it would be easier and faster if you would update the code, @paulmueller.

Yes, Paul maybe you could take over from here. Glad that I could help.

Simply flipping the two signs is not sufficient. I am investigating...

No, this is not the culprit.

Before flipping the sign:
image

After flipping the sign:
image

is it also a float64 problem?
btw. have you flipped signs in nominator and denominator (mu_raw['mu20'] - mu_raw['mu02']) ?

It strongly believe it was an integer overflow issue. The contour data are handled internally as int32. When calling cont_moments_cv, for some reason there is a data type conversion taking place (which does not make sense to me right now - it was to avoid integer overflows).

When the moments should have been

{'m00': 2072.0,
 'm10': 1763336.1666666665,
 'm01': 98008.66666666666,
 'm20': 1501137877.8333333,
 'm11': 83402392.91666666,
 'm02': 4879924.666666666,
 'm30': 1278338176690.35,
 'm21': 70995855422.88333,
 'm12': 4152781179.116667,
 'm03': 253869174.4,
 'mu20': 484192.179523468,
 'mu11': -6017.594969972968,
 'mu02': 243969.67546117585,
 'mu30': -770365.7822265625,
 'mu21': 54359.37913309783,
 'mu12': 383446.2870258689,
 'mu03': -38738.347044467926}

they were computed to

{'m00': 2072.0,
 'm10': 1763336.1666666665,
 'm01': 98008.66666666666,
 'm20': -288431828.8333333,
 'm11': 83402392.91666666,
 'm02': 4879924.666666666,
 'm30': -1347329152.8500001,
 'm21': -586932843.7833333,
 'm12': -714848423.0166667,
 'm03': 253869174.4,
 'mu20': -1789085514.487143,
 'mu11': -6017.594969972968,
 'mu02': 243969.67546117585,
 'mu30': 3289251445610.116,
 'mu21': 13066561873.33879,
 'mu12': -4867246155.846308,
 'mu03': -38738.347044467926}

One solution is to convert the contour data to a floating point array. I am trying to figure out whether there is a better solution.

BTW this also affected the computation of the other inertia ratios as well as tilt.

image

Only tilt differs, because tilt is defined as strictly positive in dclab:
image

image

image

image

Note that our (super private 😶) dcevent is not affected by this bug, since it is using the OpenCV implementation for the computation of moments all-the-way-through, which does not use int32 for contours.