Scanpy Integration Not Working with 1.3.0

Question

Scanpy Integration Not Working with 1.3.0

Closed this issue a year ago · 2 comments

Hello,

I recently updated palantir to the latest release (1.3.0) using pip install -U palantir and found that my previous notebooks do not work. I was using the scanpy integration through sc.external.tl.palantir(adata) and now get an error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[56], line 1
----> 1 sc.external.tl.palantir(adata, 
      2                         n_components=20, 
      3                         knn=30)

File ~/miniconda3/envs/sc_analysis/lib/python3.8/site-packages/scanpy/external/tl/_palantir.py:209, in palantir(adata, n_components, knn, alpha, use_adjacency_matrix, distances_key, n_eigs, impute_data, n_steps, copy)
    206     df = pd.DataFrame(adata.obsm['X_pca'], index=adata.obs_names)
    208 # Diffusion maps
--> 209 dm_res = run_diffusion_maps(
    210     data_df=df,
    211     n_components=n_components,
    212     knn=knn,
    213     alpha=alpha,
    214 )
    215 # Determine the multi scale space of the data
    216 ms_data = determine_multiscale_space(dm_res=dm_res, n_eigs=n_eigs)

TypeError: run_diffusion_maps() got an unexpected keyword argument 'data_df'

If I try to recalculate palantir results with already found diffusion maps, I get a similar error:

pr_res = sc.external.tl.palantir_results(adata, early_cell=start_cell, 
                                         ms_data = 'X_palantir_multiscale', num_waypoints=1000)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 pr_res = sc.external.tl.palantir_results(adata, early_cell = start_cell, 
      2                                          ms_data='X_palantir_multiscale', num_waypoints=1000)

File ~/miniconda3/envs/sc_analysis/lib/python3.8/site-packages/scanpy/external/tl/_palantir.py:294, in palantir_results(adata, early_cell, ms_data, terminal_states, knn, num_waypoints, n_jobs, scale_components, use_early_cell_as_start, max_iterations)
    291 from palantir.core import run_palantir
    293 ms_data = pd.DataFrame(adata.obsm[ms_data], index=adata.obs_names)
--> 294 pr_res = run_palantir(
    295     ms_data=ms_data,
    296     early_cell=early_cell,
    297     terminal_states=terminal_states,
    298     knn=knn,
    299     num_waypoints=num_waypoints,
    300     n_jobs=n_jobs,
    301     scale_components=scale_components,
    302     use_early_cell_as_start=use_early_cell_as_start,
    303     max_iterations=max_iterations,
    304 )
    306 return pr_res

TypeError: run_palantir() got an unexpected keyword argument 'ms_data'

I am using the most recent scanpy release 1.9.4. Please let me know if you need any additional information.

Thank you!

Answer 1 · 2023-09-14T18:00:47.000Z

Hello @dgodovich,

Thank you for bringing this to our attention. Given the deprecation of scanpy.external, we've aligned Palantir's functionality to be similar to the Scanpy wrapper for ease of transition.

Default Approach:

Here's how we would suggest to execute your diffusion maps and Palantir analysis:

# Diffusion Maps
palantir.utils.run_diffusion_maps(adata, n_components=20, knn=30)

# Multiscale Space
palantir.utils.determine_multiscale_space(adata)

# Run Palantir
palantir.core.run_palantir(adata, early_cell=start_cell, num_waypoints=1000)

Key Differences:

The naming conventions for adata.obsm keys differ by default.

For Scanpy Naming Scheme:

If you're keen on Scanpy's naming scheme, you can explicitly specify the keys as follows:

# Diffusion Maps
palantir.utils.run_diffusion_maps(
    adata, 
    n_components=20, 
    knn=30, 
    eigvec_key="X_palantir_diff_comp",
    eigval_key="palantir_EigenValues",
    sim_key="palantir_diff_op"
)

# Multiscale Space
palantir.utils.determine_multiscale_space(
    adata,
    eigvec_key="X_palantir_diff_comp",
    out_key="X_palantir_multiscale"
)

# Run Palantir
pr_res = palantir.core.run_palantir(
    adata,
    early_cell=start_cell,
    num_waypoints=1000,
    eigvec_key="X_palantir_multiscale"
)

Note:

The palantir.core.run_palantir wrapper now additionally saves the results in adata.obs and adata.obsm under the following keys:

adata.obs["palantir_pseudotime"]
adata.obs["palantir_entropy"]
adata.obsm["palantir_fate_probabilities"]
adata.uns["palantir_waypoints"]

Answer 2 · 2023-09-14T18:37:48.000Z

I did not know scanpy is depreciating the external API. I followed the tutorial notebook and was able to reproduce my previous results with very similar code to what you provided here, so I have no issues.

I appreciate saving results in adata.obs and adata.obsm, as that saves a step later on. Generally this workflow is easier to understand as well.

My one note is that the documentation on your home page says that Palantir is fully integrated with scanpy, which is no longer the case.

Thank you for the comprehensive reply!