YosefLab/scib-metrics

different neighbors numbers for cells with clisi_knn

yuyanMaggieliu opened this issue · 3 comments

Report

i am running scib_metrics benchmark with the tutorial(Benchmarking lung integration from ), i have use run 4 methods and save the embedding in obsm as the tutorial told. However, when i was using bm=Benchmarker(adata) and run bm.benchmark(), it raise the error from clisi_knn method that "ValueError: Each cell must have the same number of neighbors.".

The detail error message is

File ~/.conda/envs/ju/lib/python3.9/site-packages/scib_metrics/benchmark/_core.py:226, in Benchmarker.benchmark(self)
223 if isinstance(use_metric_or_kwargs, dict):
224 # Kwargs in this case
225 metric_fn = partial(metric_fn, **use_metric_or_kwargs)
--> 226 metric_value = getattr(MetricAnnDataAPI, metric_name)(ad, metric_fn)
227 # nmi/ari metrics return a dict
228 if isinstance(metric_value, dict):

File ~/.conda/envs/ju/lib/python3.9/site-packages/scib_metrics/benchmark/_core.py:88, in MetricAnnDataAPI.(ad, fn)
86 nmi_ari_cluster_labels_kmeans = lambda ad, fn: fn(ad.X, ad.obs[_LABELS])
87 silhouette_label = lambda ad, fn: fn(ad.X, ad.obs[_LABELS])
---> 88 clisi_knn = lambda ad, fn: fn(ad.obsp["90_distances"], ad.obs[_LABELS])
89 graph_connectivity = lambda ad, fn: fn(ad.obsp["15_distances"], ad.obs[_LABELS])
90 silhouette_batch = lambda ad, fn: fn(ad.X, ad.obs[_LABELS], ad.obs[_BATCH])

File ~/.conda/envs/ju/lib/python3.9/site-packages/scib_metrics/_lisi.py:98, in clisi_knn(X, labels, perplexity, scale)
74 """Compute the cell-type local inverse simpson index (cLISI) for each cell :cite:p:korsunsky2019harmony.
75
76 Returns a scaled version of the cLISI score for each cell, by default :cite:p:luecken2022benchmarking.
(...)
95 Array of shape (n_cells,) with the cLISI score for each cell.
96 """
97 labels = np.asarray(pd.Categorical(labels).codes)
---> 98 lisi = lisi_knn(X, labels, perplexity=perplexity)
99 clisi = np.nanmedian(lisi)
100 if scale:

File ~/.conda/envs/ju/lib/python3.9/site-packages/scib_metrics/_lisi.py:29, in lisi_knn(X, labels, perplexity)
9 """Compute the local inverse simpson index (LISI) for each cell :cite:p:korsunsky2019harmony.
10
11 Parameters
(...)
26 Array of shape (n_cells,) with the LISI score for each cell.
27 """
28 labels = np.asarray(pd.Categorical(labels).codes)
---> 29 knn_dists, knn_idx = convert_knn_graph_to_idx(X)
31 if perplexity is None:
32 perplexity = np.floor(knn_idx.shape[1] / 3)

File ~/.conda/envs/ju/lib/python3.9/site-packages/scib_metrics/utils/_utils.py:60, in convert_knn_graph_to_idx(X)
58 print(np.unique(n_neighbors))
59 if len(np.unique(n_neighbors)) > 1:
---> 60 raise ValueError("Each cell must have the same number of neighbors.")
62 n_neighbors = int(np.unique(n_neighbors)[0])
63 with warnings.catch_warnings():

ValueError: Each cell must have the same number of neighbors.

Version information

No response

Hi, thank you for reporting this issue. I saw this usually happening when one used a subset of the populations without recomputing or extending the neighborhood graph. I hope that helps.

I also ran into this issue in a case where some cells have additional neighbors (other than themselves) with distance 0. Adding a tiny value to the distances of those neighbors as follows resolved the issue:

neigh_output = pynndescent(
    adata.obsm[latent_key],
    n_neighbors=n_neighbors,
    random_state=random_state,
    n_jobs=n_jobs)
indices, distances = neigh_output.indices, neigh_output.distances

row_idx = np.where(distances == 0)[0]
col_idx = np.where(distances == 0)[1]
new_row_idx = row_idx[np.where(row_idx != indices[row_idx, col_idx])[0]]
new_col_idx = col_idx[np.where(row_idx != indices[row_idx, col_idx])[0]]
distances[new_row_idx, new_col_idx] = (distances[new_row_idx, new_col_idx] +
                                       np.nextafter(0, 1, dtype=np.float32))

sp_distances, sp_conns = sc.neighbors._compute_connectivities_umap(
        indices[:, :n_neighbors],
        distances[:, :n_neighbors],
        adata.n_obs,
        n_neighbors=n_neighbors)

I also ran into this issue in a case where some cells have additional neighbors (other than themselves) with distance 0. Adding a tiny value to the distances of those neighbors as follows resolved the issue:

neigh_output = pynndescent(
    adata.obsm[latent_key],
    n_neighbors=n_neighbors,
    random_state=random_state,
    n_jobs=n_jobs)
indices, distances = neigh_output.indices, neigh_output.distances

row_idx = np.where(distances == 0)[0]
col_idx = np.where(distances == 0)[1]
new_row_idx = row_idx[np.where(row_idx != indices[row_idx, col_idx])[0]]
new_col_idx = col_idx[np.where(row_idx != indices[row_idx, col_idx])[0]]
distances[new_row_idx, new_col_idx] = (distances[new_row_idx, new_col_idx] +
                                       np.nextafter(0, 1, dtype=np.float32))

sp_distances, sp_conns = sc.neighbors._compute_connectivities_umap(
        indices[:, :n_neighbors],
        distances[:, :n_neighbors],
        adata.n_obs,
        n_neighbors=n_neighbors)

May I ask you to clarify what did you and in which part of the pipeline you added that?