ventolab/CellphoneDB

UFuncTypeError: when running cpdb analysis

Opened this issue · 3 comments

I am new to Single-Cell Sequencing Analysis and coding, but I am working on analyzing some sequencing data and comparing interactions between cells.

I followed the CellPhone DB tutorials to prepare my data both through Jupyter Notebooks and R-studio, and I am getting this same error with the data set processed and prepared for CellPhoneDB through either program. I am also on Windows. I set up the paths first:
image

And I made sure that I can read in my data and that the dimensions match up:
image

My error is when I try to run the analysis:
image

I receive this error:
UFuncTypeError Traceback (most recent call last)
Cell In[63], line 8
1 ##### Step 3: Run basic analysis
2
3 # copying code from: https://github.com/ventolab/CellphoneDB/blob/master/notebooks/T01_Method1.ipynb
4 # tutorial for CellPhoneDB
6 from cellphonedb.src.core.methods import cpdb_analysis_method
----> 8 means, deconvoluted = cpdb_analysis_method.call(
9 cpdb_file_path = cpdb_file_path, # mandatory: CellPhoneDB database zip file.
10 meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
11 counts_file_path = counts_file_path, # mandatory: normalized count matrix.
12 counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
13 output_path = out_path, # Path to save results microenvs_file_path = None,
14 separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
15 threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
16 result_precision = 3, # Sets the rounding for the mean values in significan_means.
17 debug = True, # Saves all intermediate tables emplyed during the analysis in pkl format.
18 output_suffix = None # Replaces the timestamp in the output files by a user defined string in the (default: None)
19 )

File ~\anaconda3\envs\cpdb\lib\site-packages\cellphonedb\src\core\methods\cpdb_analysis_method.py:116, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, separator, threshold, result_precision, debug, output_suffix)
110 cluster_interactions = cpdb_statistical_analysis_helper.get_cluster_combinations(clusters['names'], microenvs)
112 base_result = cpdb_statistical_analysis_helper.build_result_matrix(interactions_filtered,
113 cluster_interactions,
114 separator)
--> 116 mean_analysis = cpdb_statistical_analysis_helper.mean_analysis(interactions_filtered,
117 clusters,
118 cluster_interactions,
119 separator)
121 percent_analysis = cpdb_statistical_analysis_helper.percent_analysis(clusters,
122 threshold,
123 interactions_filtered,
124 cluster_interactions,
125 separator)
127 if debug:

File ~\anaconda3\envs\cpdb\lib\site-packages\cellphonedb\src\core\methods\cpdb_statistical_analysis_helper.py:359, in mean_analysis(interactions, clusters, cluster_combinations, separator)
353 x = clusters['means'].loc[gene1_ids, cluster1_names].values
354 y = clusters['means'].loc[gene2_ids, cluster2_names].values
356 result = pd.DataFrame(
357 (x > 0) * (y > 0) * (x + y) / 2,
358 index=interactions.index,
--> 359 columns=(pd.Series(cluster1_names) + separator + pd.Series(cluster2_names)).values)
361 return result

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\common.py:81, in _unpack_zerodim_and_defer..new_method(self, other)
77 return NotImplemented
79 other = item_from_zerodim(other)
---> 81 return method(self, other)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\arraylike.py:186, in OpsMixin.add(self, other)
98 @unpack_zerodim_and_defer("add")
99 def add(self, other):
100 """
101 Get Addition of DataFrame and other, column-wise.
102
(...)
184 moose 3.0 NaN
185 """
--> 186 return self._arith_method(other, operator.add)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\series.py:6112, in Series._arith_method(self, other, op)
6110 def _arith_method(self, other, op):
6111 self, other = ops.align_method_SERIES(self, other)
-> 6112 return base.IndexOpsMixin._arith_method(self, other, op)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\base.py:1348, in IndexOpsMixin._arith_method(self, other, op)
1345 rvalues = ensure_wrapped_if_datetimelike(rvalues)
1347 with np.errstate(all="ignore"):
-> 1348 result = ops.arithmetic_op(lvalues, rvalues, op)
1350 return self._construct_result(result, name=res_name)

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\array_ops.py:232, in arithmetic_op(left, right, op)
228 _bool_arith_check(op, left, right)
230 # error: Argument 1 to "_na_arithmetic_op" has incompatible type
231 # "Union[ExtensionArray, ndarray[Any, Any]]"; expected "ndarray[Any, Any]"
--> 232 res_values = _na_arithmetic_op(left, right, op) # type: ignore[arg-type]
234 return res_values

File ~\anaconda3\envs\cpdb\lib\site-packages\pandas\core\ops\array_ops.py:171, in _na_arithmetic_op(left, right, op, is_cmp)
168 func = partial(expressions.evaluate, op)
170 try:
--> 171 result = func(left, right)
172 except TypeError:
173 if not is_cmp and (is_object_dtype(left.dtype) or is_object_dtype(right)):
174 # For object dtype, fallback to a masked operation (only operating
175 # on the non-missing values)
176 # Don't do this for comparisons, as that will handle complex numbers
177 # incorrectly, see GH#32047

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('int64'), dtype('<U1')) -> None

I believe there is an issue with how my data was processed or saved, but I do not know where the issue is or how to fix it.

Hi kpeeles01,

My apologies for not replying earlier. Please use strings rather than numbers to represent cell types in all your input files (meta, counts). Then the above issue should go away. I will amend CellphoneDB documentation to advise users against using numeric cell type identifiers.

Best wishes,

Robert.

Hi kpeeles01,

In your original meta file (with numeric cell types), if you rename at least one of your cell types to text - e.g. 1 to c1 - it will force pandas to treat that column as text and the code should then work. I have tested a simple test_meta.txt file:

Cell cell_type
d-pos_AAACCTGAGCAGGTCA c1
d-pos_AAACCTGGTACCGAGA 0
d-pos_AAACCTGTCGCCATAA c1
d-pos_AAACGGGTCAGTTGAC 2
d-pos_AAAGATGCATTGAGCT 0
d-pos_AAAGATGTCCAAAGTC 0
d-pos_AAAGCAAAGAGGACGG 3
d-pos_AAAGCAACACATTCGA c1
d-pos_AAAGTAGAGAGCCCAA 0
d-pos_AAAGTAGCAAGCTGAG 0

with the following simple python code:

import pandas as pd
import numpy as np
f=open("test_meta.txt")
meta = pd.read_csv(f, sep='\t')
CELL_TYPE = 'cell_type'
meta[CELL_TYPE] = meta[CELL_TYPE].astype('category')
cluster_names = meta[CELL_TYPE].cat.categories
cluster_combinations = np.array(np.meshgrid(cluster_names.values, cluster_names.values)).T.reshape(-1, 2)
cluster1_names = cluster_combinations[:, 0]
cluster2_names = cluster_combinations[:, 1]
separator='\t'
pd.Series(cluster1_names) + separator + pd.Series(cluster2_names)

Before I changed 1 to c1 in the file I got the same error as you, but after the change above code worked.
The above code mimics what happens in CellphoneDB and thus your CellphoneDB analysis should work as well.

Could you give it a go a let me know how you got on?

Best,

Robert.