yfukai/m2stitch

ValueError: Found array with 1 sample(s) (shape=(1, 2)) while a minimum of 2 is required by MinCovDet.

mimuelle opened this issue · 9 comments

Hi!
I was surprised how hard it is to find a nice python implementation of stitching algorithms but all the more excited to find m2stitch!
Unfortunately I have been unable to get it to work on my data, despite several attempted fixes.
It runs on the example data that you provide however.

I sent you a link to the email that I found on your website, for downloading a folder which contains an example of my data along with a minimal jupyter notebook to load the data and run the stitching. In case you find a moment to run it on this data that would be amazing! Otherwise any help to interpret the errors is also highly appreciated.

Below the errors that I get:

Initally when running the algorithm I get the following error:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 3
      1 # Note : the row_col_transpose=True is kept only for the sake of version compatibility.
      2 # In the mejor version, the row_col_transpose=False will be the default.
----> 3 result_df, _ = m2stitch.stitch_images(images, rows_list, cols_list, row_col_transpose=False)

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/m2stitch/stitching.py:192, in stitch_images(images, rows, cols, position_indices, position_initial_guess, overlap_diff_threshold, pou, full_output, row_col_transpose)
    189             grid.loc[i2, f"{direction}_{key}_first"] = max_peak[j]
    191 # TODO make threshold adjustable
--> 192 assert np.any(grid["top_ncc_first"] > 0.5), "there is no good top pair"
    193 assert np.any(grid["left_ncc_first"] > 0.5), "there is no good left pair"
    194 predictor = ElipticEnvelopPredictor(contamination=0.4, epsilon=0.01, random_seed=0)

AssertionError: there is no good top pair

Based on your comments on previous issues I changed the hard coded ncc cutoff value from 0.5 to 0.1 (is that a reasonable value?) and got rid of this error.

However, when trying again I ran into the followin error:

ValueError                                Traceback (most recent call last)
Cell In[5], line 3
      1 # Note : the row_col_transpose=True is kept only for the sake of version compatibility.
      2 # In the mejor version, the row_col_transpose=False will be the default.
----> 3 result_df, _ = m2stitch.stitch_images(images, rows_list, cols_list, row_col_transpose=False)

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/m2stitch/stitching.py:198, in stitch_images(images, rows, cols, position_indices, position_initial_guess, overlap_diff_threshold, pou, full_output, row_col_transpose)
    194 predictor = ElipticEnvelopPredictor(contamination=0.4, epsilon=0.01, random_seed=0)
    195 left_displacement = compute_image_overlap2(
    196     grid[grid["left_ncc_first"] > 0.1], "left", sizeY, sizeX, predictor
    197 )
--> 198 top_displacement = compute_image_overlap2(
    199     grid[grid["top_ncc_first"] > 0.1], "top", sizeY, sizeX, predictor
    200 )
    201 overlap_top = np.clip(100 - top_displacement[0] * 100, pou, 100 - pou)
    202 overlap_left = np.clip(100 - left_displacement[1] * 100, pou, 100 - pou)

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/m2stitch/_stage_model.py:45, in compute_image_overlap2(grid, direction, sizeY, sizeX, predictor)
     38 translation = np.array(
     39     [
     40         grid[f"{direction}_y_first"].values / sizeY,
     41         grid[f"{direction}_x_first"].values / sizeX,
     42     ]
     43 )
     44 translation = translation[:, np.all(np.isfinite(translation), axis=0)]
---> 45 c = predictor(translation.T)
     46 res = np.median(translation[:, c == 1], axis=1)
     47 assert len(res) == 2

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/m2stitch/stitching.py:42, in ElipticEnvelopPredictor.__call__(self, X)
     40 rng = np.random.default_rng(self.random_seed)
     41 X = rng.normal(size=X.shape) * self.epsilon + X
---> 42 return ee.fit_predict(X) > 0

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/sklearn/base.py:967, in OutlierMixin.fit_predict(self, X, y)
    949 """Perform fit on X and returns labels for X.
    950 
    951 Returns -1 for outliers and 1 for inliers.
   (...)
    964     1 for inliers, -1 for outliers.
    965 """
    966 # override for transductive outlier detectors like LocalOulierFactor
--> 967 return self.fit(X).predict(X)

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/sklearn/covariance/_elliptic_envelope.py:182, in EllipticEnvelope.fit(self, X, y)
    166 """Fit the EllipticEnvelope model.
    167 
    168 Parameters
   (...)
    179     Returns the instance itself.
    180 """
    181 # `_validate_params` is called in `MinCovDet`
--> 182 super().fit(X)
    183 self.offset_ = np.percentile(-self.dist_, 100.0 * self.contamination)
    184 return self

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/sklearn/covariance/_robust_covariance.py:740, in MinCovDet.fit(self, X, y)
    723 """Fit a Minimum Covariance Determinant with the FastMCD algorithm.
    724 
    725 Parameters
   (...)
    737     Returns the instance itself.
    738 """
    739 self._validate_params()
--> 740 X = self._validate_data(X, ensure_min_samples=2, estimator="MinCovDet")
    741 random_state = check_random_state(self.random_state)
    742 n_samples, n_features = X.shape

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/sklearn/base.py:535, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, **check_params)
    533     raise ValueError("Validation should be done on X, y or both.")
    534 elif not no_val_X and no_val_y:
--> 535     X = check_array(X, input_name="X", **check_params)
    536     out = X
    537 elif no_val_X and not no_val_y:

File ~/opt/anaconda3/envs/20221213_janin_iterative_fish/lib/python3.9/site-packages/sklearn/utils/validation.py:929, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    927     n_samples = _num_samples(array)
    928     if n_samples < ensure_min_samples:
--> 929         raise ValueError(
    930             "Found array with %d sample(s) (shape=%s) while a"
    931             " minimum of %d is required%s."
    932             % (n_samples, array.shape, ensure_min_samples, context)
    933         )
    935 if ensure_min_features > 0 and array.ndim == 2:
    936     n_features = array.shape[1]

ValueError: Found array with 1 sample(s) (shape=(1, 2)) while a minimum of 2 is required by MinCovDet.

Any ideas on what the problem is? Because when plotting the images in the grid layout it is clear that the order is correct and the overlaps sufficient for proper stitching.

Thank you very much for your help!

Thanks, @mimuelle for reporting this! I'm happy to hear about your interest in this package. Unfortunately, I'm not sure how soon I can take time for this package, but let me investigate this in the scale of two weeks or so.
Also, I feel sorry for the inconvenient error messages. This is a bit old project with inexperienced coding, and at that time, I focused on tracing the logic of MIST with limited time.
Rather than incrementally improving this package, I'm now trying to implement something new at https://github.com/yfukai/MicroTailor (while it is just a scaffold now). If you would allow me to include the data as test data in the repository, I can make sure that the new package will work with your data. However, I would totally understand if you cannot permit it for some licensing reasons.
Anyway, thanks a lot and looking forward to hearing from you!

That would be great, thank you for looking into it!
I will get in touch with the people who acquired the data and get back to you, but I think it should be fine.
I'm looking forward to hearing from you!

Hi @mimuelle, I think the cause is that the threshold for ncc was partially rewritten.
Can you try the stitching with the latest GitHub version installed by the following command?

pip install git+https://github.com/yfukai/m2stitch@master

In this version, you can set the ncc_threshold parameter as

result_df, _ = m2stitch.stitch_images(images, 
    rows_list, 
    cols_list, 
    row_col_transpose=False,
    ncc_threshold=0.1)

It worked in my environment. If it is fine to include the data in the repo, I'll release the updated version.
Cheers, YF

Reopen as mistakenly closed.

Hi @yfukai ,
The latest version works now, thank you very much for fixing it so quickly!!
I will ask for the permission to share the data next week, so maybe you can leave this issue open for 2 more days until we have figured that out as well.
Thanks again!

Thanks @mimuelle! In any case, I'll make a new release in a couple of weeks, but it would also be great if I could include your data as test data for this edge case.

Hi @yfukai ,
I discussed the data and it is fine for you to include it as test data in the next release!
Thank you again for the quick fix and all the best!

Closed by #355.

version 0.7.0 will be the version with this feature!