type KeyError with new v1.3.1: output from determine_multiscale_space
zktuong opened this issue · 6 comments
Hi,
i'm trying to run a simple chunk like so:
pca_projections = pd.DataFrame(pb_adata.obsm["X_pca"], index=pb_adata.obs_names)
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=5)
ms_data = palantir.utils.determine_multiscale_space(dm_res)
pr_res = palantir.core.run_palantir(
ms_data,
pb_adata.obs_names[rootcell],
num_waypoints=500,
terminal_states=terminal_states.index,
)
but it's triggering an error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/Users/uqztuong/Library/CloudStorage/OneDrive-TheUniversityofQueensland/Documents/GitHub/dandelion/docs/notebooks/8-pseudobulk-trajectory.ipynb Cell 20 line 1
14 dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=5)
15 ms_data = palantir.utils.determine_multiscale_space(dm_res)
---> 17 pr_res = palantir.core.run_palantir(
18 ms_data,
19 pb_adata.obs_names[rootcell],
20 num_waypoints=500,
21 terminal_states=terminal_states.index,
22 )
24 pr_res.branch_probs.columns = terminal_states[pr_res.branch_probs.columns]
File ~/Library/CloudStorage/OneDrive-TheUniversityofQueensland/Documents/GitHub/Palantir/src/palantir/core.py:129, in run_palantir(data, early_cell, terminal_states, knn, num_waypoints, n_jobs, scale_components, use_early_cell_as_start, max_iterations, eigvec_key, pseudo_time_key, entropy_key, fate_prob_key, save_as_df, waypoints_key, seed)
125 # ################################################
126 # Determine the boundary cell closest to user defined early cell
127 dm_boundaries = pd.Index(set(data_df.idxmax()).union(data_df.idxmin()))
128 dists = pairwise_distances(
--> 129 data_df.loc[dm_boundaries, :], data_df.loc[early_cell, :].values.reshape(1, -1)
130 )
131 start_cell = pd.Series(np.ravel(dists), index=dm_boundaries).idxmin()
132 if use_early_cell_as_start:
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexing.py:1067, in _LocationIndexer.__getitem__(self, key)
1065 if self._is_scalar_access(key):
1066 return self.obj._get_value(*key, takeable=self._takeable)
-> 1067 return self._getitem_tuple(key)
1068 else:
1069 # we by definition only have the 0th axis
1070 axis = self.axis or 0
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexing.py:1247, in _LocIndexer._getitem_tuple(self, tup)
1245 with suppress(IndexingError):
1246 tup = self._expand_ellipsis(tup)
-> 1247 return self._getitem_lowerdim(tup)
1249 # no multi-index, so validate all of the indexers
1250 tup = self._validate_tuple_indexer(tup)
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexing.py:967, in _LocationIndexer._getitem_lowerdim(self, tup)
963 for i, key in enumerate(tup):
964 if is_label_like(key):
965 # We don't need to check for tuples here because those are
966 # caught by the _is_nested_tuple_indexer check above.
--> 967 section = self._getitem_axis(key, axis=i)
969 # We should never have a scalar section here, because
970 # _getitem_lowerdim is only called after a check for
971 # is_scalar_access, which that would be.
972 if section.ndim == self.ndim:
973 # we're in the middle of slicing through a MultiIndex
974 # revise the key wrt to `section` by inserting an _NS
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexing.py:1312, in _LocIndexer._getitem_axis(self, key, axis)
1310 # fall thru to straight lookup
1311 self._validate_key(key, axis)
-> 1312 return self._get_label(key, axis=axis)
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexing.py:1260, in _LocIndexer._get_label(self, label, axis)
1258 def _get_label(self, label, axis: int):
1259 # GH#5567 this will fail if the label is not present in the axis.
-> 1260 return self.obj.xs(label, axis=axis)
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/generic.py:4056, in NDFrame.xs(self, key, axis, level, drop_level)
4054 new_index = index[loc]
4055 else:
-> 4056 loc = index.get_loc(key)
4058 if isinstance(loc, np.ndarray):
4059 if loc.dtype == np.bool_:
File /opt/homebrew/Caskroom/mambaforge/base/envs/dandelion/lib/python3.11/site-packages/pandas/core/indexes/range.py:395, in RangeIndex.get_loc(self, key, method, tolerance)
393 raise KeyError(key) from err
394 self._check_indexing_error(key)
--> 395 raise KeyError(key)
396 return super().get_loc(key, method=method, tolerance=tolerance)
KeyError: '710'
my pb_adata.obs_names
are ['0', '1', '2', ... '1360']
in order to solve this, i had to do:
pca_projections = pd.DataFrame(pb_adata.obsm["X_pca"], index=pb_adata.obs_names)
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=5)
ms_data = palantir.utils.determine_multiscale_space(dm_res)
ms_data.index = ms_data.index.astype(str)
pr_res = palantir.core.run_palantir(
ms_data,
adata.obs_names[rootcell],
num_waypoints=500,
terminal_states=terminal_states.index,
)
This isn't an issue in v1.3.0 but occuring for me in v1.3.1.
i see that you've made some changes to run_diffusion_maps
recently - can you think of why this is happening?
Hi @zktuong,
Thank you for your detailed report. I've reviewed the changes between v1.3.0 and v1.3.1 in palantir.core
: Comparison Link. Surprisingly, I found no modifications affecting your case. Additionally, the failing call data_df.loc[early_cell, :]
remains unchanged in v1.3.0: Commit Reference.
Likewise, palantir.utils.determine_multiscale_space
appears invariant for your use case.
For debugging, could you check if the index data type undergoes a conversion, perhaps to a categorical type? This may be due to varying versions of Scanpy or pandas.
As an immediate remedy, Palantir v1.3.1 supports direct AnnData object input, eliminating the need to manually create DataFrames. Here's an example; adjust parameters as needed.
Looking forward to your insights and whether this mitigates your issue.
hi there, thanks for the prompt response!
your solution works!
However, for completeness, the source of the issue is with dm_res["EigenVectors"].index
returning a RangeIndex
instead of Index
in v1.3.1.
I tested it on v.1.2.0, 1.3.0 and 1.3.1 out over here: https://github.com/zktuong/troubleshooting_palantir/
(look at cell 8 in the 3 notebooks)
I suppose this is taken care of within anndata but if a user doesn't want to use anndata and just wants to use a pandas dataframe, then it will cause this issue.
so looking at the code, it would be here:
Palantir/src/palantir/utils.py
Lines 397 to 398 in 580ac87
not sure how to adjust it without breaking stuff...
Thanks for the insightful analysis! A refactor inadvertently omitted lines that manage the index; this has been rectified in this commit. For those keen to test the hotfix, execute the following:
pip install 'git+https://github.com/settylab/Palantir'
Please free feel to report any feedback or reopen the issue if it persists regardless of the patch!
thanks for the swift update! works now!