Problems with the tensor pipeline at the Tensor Factorization step
Closed this issue · 4 comments
Hi,
I am trying to run cell2cell tensor using GPU on my single cell data from mouse (Iigand-receptor pairs were downloaded from https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Mouse/Mouse-2020-Jin-LR-pairs.csv). While I did not get errors when running your examples, I got troubles at the tensor cell2cell pipeline step below:`
tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
meta_tf,
copy_tensor=True, # Whether to output a new tensor or modifying the original
rank= None, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
tf_optimization='regular', # To define how robust we want the analysis to be.
random_state=888, # Random seed for reproducibility
backend='pytorch', # This enables a banckend that supports using a GPU.
device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
elbow_metric='error', # Metric to use in the elbow analysis.
smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
upper_rank=25, # Max number of factors to try in the elbow analysis
tf_init='random', # Initialization method of the tensor factorization
tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
sample_col='Element', # Columns containing the elements in the tensor metadata
group_col='Category', # Columns containing the major groups in the tensor metadata
fig_fontsize=14, # Fontsize of the figures generated
output_folder=output_folder, # Whether to save the figures and loadings in files. If so, a folder pathname must be passed
output_fig=True, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
fig_format='pdf', # File format of the figures.
)
At the end I only get the elbow plot with the rank. Indeed, the pipeline starts the tensor factorization but it stops with the following error:
Running Elbow Analysis
100%|██████████| 25/25 [05:41<00:00, 13.68s/it]
The rank at the elbow is: 8
Running Tensor Factorization
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[27], line 1
----> 1 tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
2 meta_tf,
3 copy_tensor=True, # Whether to output a new tensor or modifying the original
4 rank= None, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
5 tf_optimization='regular', # To define how robust we want the analysis to be.
6 random_state=888, # Random seed for reproducibility
7 backend='pytorch', # This enables a banckend that supports using a GPU.
8 device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
9 elbow_metric='error', # Metric to use in the elbow analysis.
10 smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
11 upper_rank=25, # Max number of factors to try in the elbow analysis
12 tf_init='random', # Initialization method of the tensor factorization
13 tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
14 cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
15 sample_col='Element', # Columns containing the elements in the tensor metadata
16 group_col='Category', # Columns containing the major groups in the tensor metadata
17 fig_fontsize=14, # Fontsize of the figures generated
18 output_folder=output_folder, # Whether to save the figures and loadings in files. If so, a folder pathname must be passed
19 output_fig=True, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
20 fig_format='pdf', # File format of the figures.
21 )
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/analysis/tensor_pipelines.py:191, in run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor, rank, tf_optimization, random_state, backend, device, elbow_metric, smooth_elbow, upper_rank, tf_init, tf_svd, cmaps, sample_col, group_col, fig_fontsize, output_folder, output_fig, fig_format, **kwargs)
189 # Factorization
190 print('Running Tensor Factorization')
--> 191 interaction_tensor.compute_tensor_factorization(rank=rank,
192 init=tf_init,
193 svd=tf_svd,
194 random_state=random_state,
195 runs=tf_runs,
196 normalize_loadings=True,
197 tol=tol, n_iter_max=n_iter_max,
198 **kwargs
199 )
201 ### EXPORT RESULTS ###
202 if output_folder is not None:
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/tensor/tensor.py:361, in BaseTensor.compute_tensor_factorization(self, rank, tf_type, init, svd, random_state, runs, normalize_loadings, var_ordered_factors, n_iter_max, tol, verbose, **kwargs)
356 self.explained_variance_ratio_ = None
358 self.explained_variance_ = self.explained_variance()
360 self.factors = OrderedDict(zip(order_labels,
--> 361 [pd.DataFrame(tl.to_numpy(f), index=idx, columns=factor_names) for f, idx in zip(factors, self.order_names)]))
362 self.rank = rank
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/tensor/tensor.py:361, in <listcomp>(.0)
356 self.explained_variance_ratio_ = None
358 self.explained_variance_ = self.explained_variance()
360 self.factors = OrderedDict(zip(order_labels,
--> 361 [pd.DataFrame(tl.to_numpy(f), index=idx, columns=factor_names) for f, idx in zip(factors, self.order_names)]))
362 self.rank = rank
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
684 mgr = dict_to_mgr(
685 # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
686 # attribute "name"
(...)
691 typ=manager,
692 )
693 else:
--> 694 mgr = ndarray_to_mgr(
695 data,
696 index,
697 columns,
698 dtype=dtype,
699 copy=copy,
700 typ=manager,
701 )
703 # For data is list-like, or Iterable (will consume into list)
704 elif is_list_like(data):
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/internals/construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
346 # _prep_ndarray ensures that values.ndim == 2 at this point
347 index, columns = _get_axes(
348 values.shape[0], values.shape[1], index=index, columns=columns
349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
353 if typ == "array":
355 if issubclass(values.dtype.type, str):
File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/internals/construction.py:422, in _check_values_indices_shape_match(values, index, columns)
420 passed = values.shape
421 implied = (len(index), len(columns))
--> 422 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")
ValueError: Shape of passed values is (1000, 8), indices imply (1999, 8)
If you'd like to check the object I am using here is the Dropbox link:
https://www.dropbox.com/scl/fo/qq37tda3yekx0nl073xsb/ACgPNftevARe-Vo1mdt2I1Q?rlkey=0ii3b2z6awx70r4jnpgloubgd&dl=0
Thank you
Sorry I cannot reproduce your error just with the h5ad file. Could you export you tensor and upload that file instead? Here there is an example of how doing so: https://colab.research.google.com/drive/1T6MUoxafTHYhjvenDbEtQoveIlHT2U6_#scrollTo=JD8Si50x1jq-
Also, could you check your tensor size and the length of the names you passed for each element in your tensor?
You can do this with these commands:
# Tensor shape
tensor.shape
# Length of Labels for contexts
len(tensor.order_names[0])
# Length of Labels for ligand-receptor pairs
len(tensor.order_names[1])
# Length of Labels for sender cells
len(tensor.order_names[2])
¢ Length of Labels for receiver cells
len(tensor.order_names[3])
I think your issue could be related with passing less or more labels than the actual elements in one of the tensor dimension. From the size, it could be the ligand-receptor pairs. It seems like you only provided labels for 1000 LR pairs, while your tensor has 1999 elements in total.
Hi Erik!
A the following link you can find the tensor and tensor metadata https://www.dropbox.com/scl/fo/qq37tda3yekx0nl073xsb/ACgPNftevARe-Vo1mdt2I1Q?rlkey=ans4a1sdnxbb3d9b77vv1kdv4&dl=0
# Tensor shape
tensor.shape
(13, 1000, 28, 28)
I have to tell you that for a reason I did not figure out, at the first attempt the Labels for ligand-receptor pairs
where all capital letters (like for human) while in the ppi_names I had them in the correct format for mouse (i.e., Kdm5d^Whatever). To correct it, since it would not work at the end I just replaced
len(tensor.order_names[1]) = ppi_names #which length is 1999!!!
Thus I think that to correct the metadata creation error, I have generated this new one!
Is there a way to make the tensor create the Labels for ligand-receptor pairs in the correct format??
Thank you very much for the help
Ludovica
I see! You are using the tensor-cell2cell analysis without LIANA, right? If so, in the step of creating the interaction tensor you need to add upper_letter_comparison=False
to keep the names in the original format, otherwise they will be transformed to capital letters.
For example:
tensor = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
ppi_data=lr_pairs,
context_names=list(context_dict.keys()),
how='outer',
outer_fraction=0.5, # Considers elements in at least 50% of samples
complex_sep='&',
interaction_columns=int_columns,
communication_score='expression_gmean',
upper_letter_comparison=False
)
Then there is no need to do tensor.order_names[1] = ppi_names
Yes I am using the tensor cell2cell without LIANA and yes the upper_letter_comparison=False
solved the issue! Thank you for the help :)
Ludovica