Bug: Events have principal inertia ratio <1

Question

Bug: Events have principal inertia ratio <1

Closed this issue 2 years ago · 19 comments

dclab version 0.48.1
Windows 10
python 3.9.0

When plotting the principal inertia ratio for a dataset in Shape-Out 2, I noticed values below 1, which should not be there, since the principal inertia ratio calculates the ratio of major over minor axis and should be per definition invariant to rotation.

The selected event looks normal and is one of the events with principal inertia axis < 1

Answer 1 · 2023-02-22T18:31:42.000Z

The algorithm for computing the principal inertia ratio did not change recently.
So this bug probably has been there since the start.

The correlation with the position along the channel axis is interesting.

I assume the principal inertia ratio is here computed on-the-fly by dclab, correct?

Answer 2 · 2023-02-23T07:46:09.000Z

How can I check if the principal inertia ratio is computed on the fly?
If I check the file with h5py, like so

h = h5py.File(some_rtdc_file.rtdc)
h["events"].keys()
>>> <KeysViewHDF5 ['area_msd', 'area_ratio', 'area_um', 'aspect', 'bright_avg', 'bright_sd', 'circ', 'contour', 'corr_deform', 'deform', 'fl1_area', 'fl1_max', 'fl1_pos', 'fl1_width', 'fl2_area', 'fl2_max', 'fl2_pos', 'fl2_width', 'fl3_area', 'fl3_max', 'fl3_pos', 'fl3_width', 'frame', 'image', 'index', 'index_online', 'inert_ratio_cvx', 'inert_ratio_raw', 'mask', 'pos_x', 'pos_y', 'size_x', 'size_y', 'temp', 'time', 'trace', 'volume']>

I don't see inert_ratio_prnc there.

Answer 3 · 2023-02-23T08:01:00.000Z

The correlation with the position along the channel axis is interesting indeed. Also visible in other datasets:

Answer 4 · 2023-02-23T08:42:12.000Z

I wonder, is the image width here longer than 256? Maybe this is some kind of integer overflow...

Answer 5 · 2023-02-23T09:29:51.000Z

Do you have an example dataset I can take a look at? This does not appear to happen for normal datasets with short image widths.

Answer 6 · 2023-02-23T10:33:07.000Z

Yes, the region of interest is quite long. I attached an example file with 300 events.
example_file_dclab_issue_212.zip

Answer 7 · 2023-02-24T09:49:51.000Z

I don't think the position matters:
I created a new file only with events with principal inertia ratio <1, then modified the file with h5py basically subtracting 200 from all position_x values and then checked the principal inertia ratio again. The plot looks basically the same.

Code:

import dclab
import h5py

ds_filtered = dclab.new_dataset("input_data.rtdc")
ds_filtered.config["filtering"]["inert_ratio_prnc min"] = 0
ds_filtered.config["filtering"]["inert_ratio_prnc max"] = 1
ds_filtered.apply_filter()
features = ds_filtered.features
if "inert_ratio_prnc" in features:
    features.remove('inert_ratio_prnc')
ds_filtered.export.hdf5(path='events_false_principal_inertia_ratio.rtdc', features=features, filtered=True, override=True)

with h5py.File('events_false_principal_inertia_ratio.rtdc', mode="r+") as h:
    for i in range(len(h['events']['pos_x'])):
        h['events']['pos_x'][i] = h['events']['pos_x'][i] - 200

ds_reduced_position = dclab.new_dataset("events_false_principal_inertia_ratio.rtdc")
plt.figure()
plt.plot(ds_reduced_position["pos_x"], ds_reduced_position["inert_ratio_prnc"], ".")
plt.ylim([0, 1])

Before:

After:

Answer 8 · 2023-03-15T08:18:35.000Z

I compared the central moments as computed by dclab with the ones computed by cv2. I used the following code for calculation and plotting. @SaraKaliman implemented the calculation of tilt angle and central moments in principal axes:

import dclab
import pandas as pd
import cv2
import numpy as np
import matplotlib.pyplot as plt


ds = dclab.new_dataset(r"example_file_dclab_issue_212.rtdc")
contours = ds["contour"][:]

features_names = ["inert_ratio_prnc_new", "tilt_new_raw",
                  "mu11_new", "mu02_new", "mu20_new", "mu11_dclab",
                  "mu02_dclab", "mu20_dclab"]

new_features = pd.DataFrame()

for n in range(len(contours)):
    cont_raw = contours[n]

    # moments up to 3rd order
    mu_raw = cv2.moments(cont_raw)
    mu_raw_dclab = dclab.features.inert_ratio.cont_moments_cv(cont_raw)

    # calculate moments on raw contour
    Ixx_raw = np.float64(mu_raw["mu02"])
    Iyy_raw = np.float64(mu_raw["mu20"])
    Ixy_raw = np.float64(mu_raw["mu11"])

    # tilt angle in radians
    angle_fi_raw = 0.5 * np.arctan2(-2 * Ixy_raw, Iyy_raw - Ixx_raw)

    # Moments of Inertia / Central moments in principal axes
    I1_raw = 0.5 * (Ixx_raw + Iyy_raw + np.sqrt(
        (Ixx_raw - Iyy_raw) ** 2 + 4 * (Ixy_raw ** 2)))
    I2_raw = 0.5 * (Ixx_raw + Iyy_raw - np.sqrt(
        (Ixx_raw - Iyy_raw) ** 2 + 4 * (Ixy_raw ** 2)))

    cell_features = np.array([[np.sqrt(I1_raw / I2_raw),
                               angle_fi_raw,
                               mu_raw["mu11"],
                               mu_raw["mu02"],
                               mu_raw["mu20"],
                               mu_raw_dclab["mu11"],
                               mu_raw_dclab["mu02"],
                               mu_raw_dclab["mu20"]]])
    cell_features = pd.DataFrame(cell_features, columns=features_names)

    if n == 0:
        new_features = cell_features
    else:
        new_features = pd.concat([new_features, cell_features], ignore_index=True)

df = pd.DataFrame()
for feature in ds.features:
    if feature == "trace":
        pass
    else:
        df[feature] = ds[feature][:]
for feature in features_names:
    df[feature] = new_features[feature]

# inert_ratio_prnc_new
plt.figure()
plt.plot(df["pos_x"], df["inert_ratio_prnc"], "s", label="dclab")
plt.plot(df["pos_x"], df["inert_ratio_prnc_new"], ".", label="new algorithm")
plt.xlabel("Position in channel (µm)")
plt.ylabel("principal inertia ratio")
plt.title("Principal inertia ratio - comparison dclab vs. new algorithm")
plt.legend()
plt.savefig("inert_ratio_prnc_dclab_vs_new_alogrithm.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["tilt"], "s", label="dclab")
plt.plot(df["pos_x"], df["tilt_new_raw"], ".", label="new algorithm")
plt.xlabel("Position in channel (µm)")
plt.ylabel("tilt")
plt.title("tilt - comparison dclab vs. new algorithm")
plt.legend()
plt.savefig("tilt_dclab_vs_new_alogrithm.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu11_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu11_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu11")
plt.title("Moment mu11 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu11_dclab_vs_cv2.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu02_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu02_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu02")
plt.title("Moment mu02 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu02_dclab_vs_cv2.png")
plt.show()

plt.figure()
plt.plot(df["pos_x"], df["mu20_new"], "s", label="dclab")
plt.plot(df["pos_x"], df["mu20_dclab"], ".", label="cv2")
plt.xlabel("Position in channel (µm)")
plt.ylabel("mu20")
plt.title("Moment mu20 - comparison dclab vs. cv2")
plt.legend()
plt.savefig("mu20_dclab_vs_cv2.png")
plt.show()

These are the results:

--> It seems that the implementation in dclab is different to the one in openCV/cv2. I could imagine that is the reason why the tilt and principal inertia ratio are incorrect in dclab:

Answer 9 · 2023-03-15T10:06:01.000Z

I will assume dclab and dcevent have the same code for prnc calculation:

orient = 0.5 * np.arctan2(2 * mu_raw['mu11'],
                              mu_raw['mu02'] - mu_raw['mu20'])
    cc = np.array(cont_raw, dtype=np.float32, copy=True)  # float32 [sic]
    rho = np.sqrt(cc[:, 0] ** 2 + cc[:, 1] ** 2)
    phi = np.arctan2(cc[:, 1], cc[:, 0]) + orient + np.pi / 2
    cc[:, 0] = rho * np.cos(phi)
    cc[:, 1] = rho * np.sin(phi)
    # compute inertia ratio of rotated contour
    mprnc = cv2.moments(cc)
    root_prnc = mprnc["mu20"] / mprnc["mu02"]

The first problem is that orientation angle is calculated wrongly because arctan2 function should take

np.arctan2(2 * -mu_raw['mu11'], mu_raw['mu20'] - mu_raw['mu02'])

The nominator and denominator have wrong sign which than makes arctan2 place the angle in the wrong quartal.
This error was present in the paper with Maik Herbig and probably propagates since then.

Secondly, principal components of the moment of inertia can be easily calculated by diagonalising the central moments matrix. Therefore, there is no need to rotate whole contour and recalculate the moments.
In dcevent:
I will implement new version of the tilt and prnc calculation in dcevent together with whole new set of features that we can be added to the analysis. I plan to have this ready by the time we finish the new segmentation model so that both are implemented at the same time.
In dclab:
When it comes to correcting this bug in dclab, simply replacing

np.arctan2(2 * mu_raw['mu11'], mu_raw['mu02'] - mu_raw['mu20'])

with

np.arctan2(2 * -mu_raw['mu11'], mu_raw['mu20'] - mu_raw['mu02'])

should correct the problem.

Answer 10 · 2023-03-15T10:16:34.000Z

Thanks for the thorough investigation ❤️

Ar you planning to create a PR to fix this or should I take over from here?

Answer 11 · 2023-03-15T10:54:08.000Z

I think it would be easier and faster if you would update the code, @paulmueller.

Answer 12 · 2023-03-15T11:40:19.000Z

Yes, Paul maybe you could take over from here. Glad that I could help.

Answer 13 · 2023-03-15T13:02:51.000Z

Simply flipping the two signs is not sufficient. I am investigating...

Answer 14 · 2023-03-15T13:23:16.000Z

No, this is not the culprit.

Before flipping the sign:

After flipping the sign:

Answer 15 · 2023-03-15T13:42:53.000Z

is it also a float64 problem?
btw. have you flipped signs in nominator and denominator (mu_raw['mu20'] - mu_raw['mu02']) ?

Answer 16 · 2023-03-15T13:48:39.000Z

It strongly believe it was an integer overflow issue. The contour data are handled internally as int32. When calling cont_moments_cv, for some reason there is a data type conversion taking place (~~which does not make sense to me right now~~ - it was to avoid integer overflows).

When the moments should have been

{'m00': 2072.0,
 'm10': 1763336.1666666665,
 'm01': 98008.66666666666,
 'm20': 1501137877.8333333,
 'm11': 83402392.91666666,
 'm02': 4879924.666666666,
 'm30': 1278338176690.35,
 'm21': 70995855422.88333,
 'm12': 4152781179.116667,
 'm03': 253869174.4,
 'mu20': 484192.179523468,
 'mu11': -6017.594969972968,
 'mu02': 243969.67546117585,
 'mu30': -770365.7822265625,
 'mu21': 54359.37913309783,
 'mu12': 383446.2870258689,
 'mu03': -38738.347044467926}

they were computed to

{'m00': 2072.0,
 'm10': 1763336.1666666665,
 'm01': 98008.66666666666,
 'm20': -288431828.8333333,
 'm11': 83402392.91666666,
 'm02': 4879924.666666666,
 'm30': -1347329152.8500001,
 'm21': -586932843.7833333,
 'm12': -714848423.0166667,
 'm03': 253869174.4,
 'mu20': -1789085514.487143,
 'mu11': -6017.594969972968,
 'mu02': 243969.67546117585,
 'mu30': 3289251445610.116,
 'mu21': 13066561873.33879,
 'mu12': -4867246155.846308,
 'mu03': -38738.347044467926}

One solution is to convert the contour data to a floating point array. I am trying to figure out whether there is a better solution.

Answer 17 · 2023-03-15T20:52:21.000Z

BTW this also affected the computation of the other inertia ratios as well as tilt.

Answer 18 · 2023-03-15T21:14:08.000Z

Only tilt differs, because tilt is defined as strictly positive in dclab:

Answer 19 · 2023-03-15T21:23:33.000Z

Note that our (super private 😶) dcevent is not affected by this bug, since it is using the OpenCV implementation for the computation of moments all-the-way-through, which does not use int32 for contours.