welch-lab/MultiVelo

error while calculating latent_time

SinghalDivya opened this issue · 17 comments

Hi,
Thank you for creating an interesting tool. I am running into a following error when I ran latent time. Could you please guide me through this error.

mv.latent_time(adata_result)

computing velocity graph (using 1/12 cores)
100%
4060/4060 [00:07<00:00, 520.87cells/s]

ValueError Traceback (most recent call last)
Cell In[113], line 1
----> 1 mv.latent_time(adata_result)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/multivelo/dynamical_chrom_func.py:3097, in latent_time(adata, vkey, **kwargs)
3095 raise ValueError('Normalized velocity matrix is not found. Please multivelo.run velocity_graph function first.')
3096 if vkey+'_norm_graph' not in adata.uns.keys():
-> 3097 velocity_graph(adata, vkey=vkey, **kwargs)
3098 scv.tl.latent_time(adata, vkey=vkey+'_norm', **kwargs)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/multivelo/dynamical_chrom_func.py:3039, in velocity_graph(adata, vkey, xkey, **kwargs)
3037 if vkey+'_norm_genes' not in adata.var.columns:
3038 adata.var[vkey+'_norm_genes'] = adata.var[vkey+'_genes']
-> 3039 scv.tl.velocity_graph(adata, vkey=vkey+'_norm', xkey=xkey, **kwargs)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py:363, in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, compute_uncertainties, approx, mode_neighbors, copy, n_jobs, backend)
359 n_jobs = get_n_jobs(n_jobs=n_jobs)
360 logg.info(
361 f"computing velocity graph (using {n_jobs}/{os.cpu_count()} cores)", r=True
362 )
--> 363 vgraph.compute_cosines(n_jobs=n_jobs, backend=backend)
365 adata.uns[f"{vkey}_graph"] = vgraph.graph
366 adata.uns[f"{vkey}_graph_neg"] = vgraph.graph_neg

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py:175, in VelocityGraph.compute_cosines(self, n_jobs, backend)
172 n_obs = self.X.shape[0]
174 # TODO: Use batches and vectorize calculation of dX in self._calculate_cosines
--> 175 res = parallelize(
176 self._compute_cosines,
177 range(self.X.shape[0]),
178 n_jobs=n_jobs,
179 unit="cells",
180 backend=backend,
181 )()
182 uncertainties, vals, rows, cols = map(_flatten, zip(*res))
184 vals = np.hstack(vals)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/core/_parallelize.py:138, in parallelize..wrapper(*args, **kwargs)
126 pbar, queue, thread = None, None, None
128 res = Parallel(n_jobs=n_jobs, backend=backend)(
129 delayed(callback)(
130 *((i, cs) if use_ixs else (cs,)),
(...)
135 for i, cs in enumerate(collections)
136 )
--> 138 res = np.array(res) if as_array else res
139 if thread is not None:
140 thread.join()

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (1, 4) + inhomogeneous part.

####### my adata_result looks like the following :

AnnData object with n_obs × n_vars = 4060 × 718
obs: 'n_counts', 'module'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'fit_alpha_c', 'fit_alpha', 'fit_beta', 'fit_gamma', 'fit_t_sw1', 'fit_t_sw2', 'fit_t_sw3', 'fit_scale_cc', 'fit_rescale_c', 'fit_rescale_u', 'fit_alignment_scaling', 'fit_model', 'fit_direction', 'fit_loss', 'fit_likelihood', 'fit_likelihood_c', 'fit_ssd_c', 'fit_var_c', 'fit_c0', 'fit_u0', 'fit_s0', 'fit_anchor_min_idx', 'fit_anchor_max_idx', 'fit_anchor_velo_min_idx', 'fit_anchor_velo_max_idx', 'velo_s_genes', 'velo_u_genes', 'velo_chrom_genes', 'velo_s_norm_genes'
uns: 'module_colors', 'neighbors', 'pca', 'umap', 'velo_chrom_params', 'velo_s_params', 'velo_u_params', 'velo_s_norm_params'
obsm: 'X_pca', 'X_umap'
varm: 'PCs', 'fit_anchor_c', 'fit_anchor_c_sw', 'fit_anchor_c_velo', 'fit_anchor_s', 'fit_anchor_s_sw', 'fit_anchor_s_velo', 'fit_anchor_u', 'fit_anchor_u_sw', 'fit_anchor_u_velo'
layers: 'ATAC', 'Ms', 'Mu', 'ambiguous', 'fit_state', 'fit_t', 'matrix', 'spliced', 'unspliced', 'velo_chrom', 'velo_s', 'velo_u', 'velo_s_norm'
obsp: '_ATAC_conn', '_RNA_conn', 'connectivities', 'distances'

It's not immediately clear to me what the cause of the issue is since it comes from scVelo (a package Multivelo depends on). Can you try scv.tl.velocity_graph(adata_result, vkey='velo_s') directly?

I don't see the connection between the two yet. What's the output of adata_result.obs['module'].cat.categories and adata_result.obs['module'].cat.categories.astype(str) ? Is it something reasonable?
Also, which version of pandas did you use?

I am using pandas version:

pd.__version__
'2.0.1'

also the output of both suggested commands are same.

adata_result.obs['module'].cat.categories 
Index(['module_1', 'module_2', 'rest'], dtype='object')

adata_result.obs['module'].cat.categories.astype(str)
Index(['module_1', 'module_2', 'rest'], dtype='object')

Can you try installing the last pandas 1.x?

I installed different pandas version, and reran the whole pipeline, still gets the same error.

pd.__version__
'1.5.3'

however, I got a new warning saying:

/work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/umap/distances.py:1063: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@numba.jit()

The warning is probably unrelated.
If you run scvelo's pipeline using the same input RNA data of multivelo
scv.tl.recover_dynamics(adata)

scv.tl.velocity(adata, mode='dynamical')

scv.tl.velocity_graph(adata)
Do you get the same error?

tried

scv.tl.recover_dynamics(adata)

recovering dynamics (using 1/12 cores)
100%
180/180 [02:47<00:00, 1.34gene/s]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[53], line 1
----> 1 scv.tl.recover_dynamics(adata_result)
      3 #scv.tl.velocity(adata, mode='dynamical')
      4 
      5 #scv.tl.velocity_graph(adata)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/dynamical_model.py:588, in recover_dynamics(data, var_names, n_top_genes, max_iter, assignment_mode, t_max, fit_time, fit_scaling, fit_steady_states, fit_connected_states, fit_basal_transcription, use_raw, load_pars, return_model, plot_results, steady_state_prior, add_key, copy, n_jobs, backend, **kwargs)
    585     adata.varm["loss"] = loss
    587 if t_max is not False:
--> 588     dm = align_dynamics(adata, t_max=t_max, dm=dm, idx=idx)
    590 logg.info("    finished", time=True, end=" " if settings.verbosity > 2 else "\n")
    591 logg.hint(
    592     "added \n"
    593     f"    '{add_key}_pars', "
    594     f"fitted parameters for splicing dynamics (adata.var)"
    595 )

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/dynamical_model.py:721, in align_dynamics(data, t_max, dm, idx, mode, remove_outliers, copy)
    718 if dm is not None and dm.recoverable:
    719     dm.m = m[idx]
    720     dm.alpha, dm.beta, dm.gamma, dm.pars[:3] = (
--> 721         np.array([dm.alpha, dm.beta, dm.gamma, dm.pars[:3]]) / dm.m[-1]
    722     )
    723     dm.t, dm.tau, dm.t_, dm.pars[4] = (
    724         np.array([dm.t, dm.tau, dm.t_, dm.pars[4]]) * dm.m[-1]
    725     )
    727 return adata if copy else dm

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.
scv.tl.velocity(adata_result, mode='dynamical')

computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
scv.tl.velocity_graph(adata_result)

computing velocity graph (using 1/12 cores)
100%
4060/4060 [00:06<00:00, 615.03cells/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[55], line 1
----> 1 scv.tl.velocity_graph(adata_result)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py:363, in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, compute_uncertainties, approx, mode_neighbors, copy, n_jobs, backend)
    359 n_jobs = get_n_jobs(n_jobs=n_jobs)
    360 logg.info(
    361     f"computing velocity graph (using {n_jobs}/{os.cpu_count()} cores)", r=True
    362 )
--> 363 vgraph.compute_cosines(n_jobs=n_jobs, backend=backend)
    365 adata.uns[f"{vkey}_graph"] = vgraph.graph
    366 adata.uns[f"{vkey}_graph_neg"] = vgraph.graph_neg

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/velocity_graph.py:175, in VelocityGraph.compute_cosines(self, n_jobs, backend)
    172 n_obs = self.X.shape[0]
    174 # TODO: Use batches and vectorize calculation of dX in self._calculate_cosines
--> 175 res = parallelize(
    176     self._compute_cosines,
    177     range(self.X.shape[0]),
    178     n_jobs=n_jobs,
    179     unit="cells",
    180     backend=backend,
    181 )()
    182 uncertainties, vals, rows, cols = map(_flatten, zip(*res))
    184 vals = np.hstack(vals)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/core/_parallelize.py:138, in parallelize.<locals>.wrapper(*args, **kwargs)
    126     pbar, queue, thread = None, None, None
    128 res = Parallel(n_jobs=n_jobs, backend=backend)(
    129     delayed(callback)(
    130         *((i, cs) if use_ixs else (cs,)),
   (...)
    135     for i, cs in enumerate(collections)
    136 )
--> 138 res = np.array(res) if as_array else res
    139 if thread is not None:
    140     thread.join()

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (1, 4) + inhomogeneous part.

Can you run recover_dynamics, velocity, and velocity_graph function all on the original adata object? I noticed that you ran velocity and velocity_graph on multivelo output instead. This is just to check if the error indeed comes from multivelo rather than scvelo. Thanks.

If I understood correctly, I ran the pipeline scvelo using my dataset, and that gave me the output.

scv.tl.recover_dynamics(adata)

recovering dynamics (using 1/12 cores)
    finished (0:15:10) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)

computing velocities
    finished (0:00:32) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/12 cores)

/work/gallo_lab/divya/softwares/miniconda3/envs/scvelo/lib/python3.8/site-packages/scvelo/core/_parallelize.py:138: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  res = np.array(res) if as_array else res
    finished (0:00:13) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)

but, when I ran just adata_rna using multivelo

it gave me the same error:

scv.tl.recover_dynamics(adata_rna)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[56], line 1
----> 1 scv.tl.recover_dynamics(adata_rna)

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/dynamical_model.py:588, in recover_dynamics(data, var_names, n_top_genes, max_iter, assignment_mode, t_max, fit_time, fit_scaling, fit_steady_states, fit_connected_states, fit_basal_transcription, use_raw, load_pars, return_model, plot_results, steady_state_prior, add_key, copy, n_jobs, backend, **kwargs)
    585     adata.varm["loss"] = loss
    587 if t_max is not False:
--> 588     dm = align_dynamics(adata, t_max=t_max, dm=dm, idx=idx)
    590 logg.info("    finished", time=True, end=" " if settings.verbosity > 2 else "\n")
    591 logg.hint(
    592     "added \n"
    593     f"    '{add_key}_pars', "
    594     f"fitted parameters for splicing dynamics (adata.var)"
    595 )

File /work/gallo_lab/divya/softwares/miniconda3/envs/multi_velo/lib/python3.8/site-packages/scvelo/tools/dynamical_model.py:721, in align_dynamics(data, t_max, dm, idx, mode, remove_outliers, copy)
    718 if dm is not None and dm.recoverable:
    719     dm.m = m[idx]
    720     dm.alpha, dm.beta, dm.gamma, dm.pars[:3] = (
--> 721         np.array([dm.alpha, dm.beta, dm.gamma, dm.pars[:3]]) / dm.m[-1]
    722     )
    723     dm.t, dm.tau, dm.t_, dm.pars[4] = (
    724         np.array([dm.t, dm.tau, dm.t_, dm.pars[4]]) * dm.m[-1]
    725     )
    727 return adata if copy else dm

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (4,) + inhomogeneous part.

You mean ran just adata_rna using scVelo? scv.tl.recover_dynamics(adata_rna) is a scVelo function.
What's the difference between adata_rna and adata? That may be what is causing the issue.

This seems to be scVelo issue, see theislab/scvelo#1058. The scVelo team is working on a fix.
I still need to look into it more closely to see if this is a common issue also affecting MultiVelo.

Per scVelo's issue thread, one possible quick fix is to downgrade numpy to 1.23.5. You can try that.

No problem. Issue closed.