KeyError: 'VEGFA_FLT1_complex'
PattF opened this issue · 9 comments
I'm fairly new to working with scrnaseq data. I'm trying to use cellphonedb on a treated vs. untreated drug dataset that was processed using Parse. I generated my metadata and expression matrix files but when attempting to run Method 2, I got the error attached below.
Appreciate any help in where I went wrong. Can send whichever files are needed to try debug my error. Thanks beforehand!
Reading user files...
The following user files were loaded successfully:
C:/data/expression_matrix_2.csv
C:/data/metadata_1.csv
[ ][CORE][14/11/23-16:35:09][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:5 Precision:3
[ ][CORE][14/11/23-16:35:09][WARNING] Debug random seed enabled. Set to 42
[ ][CORE][14/11/23-16:35:10][INFO] Running Real Analysis
[ ][CORE][14/11/23-16:35:10][INFO] Running Statistical Analysis
100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:35<00:00, 28.22it/s]
[ ][CORE][14/11/23-16:35:46][INFO] Building Pvalues result
[ ][CORE][14/11/23-16:35:46][INFO] Building results
[ ][CORE][14/11/23-16:35:47][INFO] Scoring interactions: Filtering genes per cell type..
100%|█████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 289.31it/s]
[ ][CORE][14/11/23-16:35:47][INFO] Scoring interactions: Calculating mean expression of each gene per group/cell type..
100%|█████████████████████████████████████████████████████████████████████████████████| 32/32 [00:00<00:00, 668.78it/s]
C:\Users\anaconda3\Lib\site-packages\cellphonedb\utils\scoring_utils.py:103: RuntimeWarning:
invalid value encountered in power
[ ][CORE][14/11/23-16:35:47][INFO] Scoring interactions: Calculating scores for all interactions and cell types..
100%|██████████████████████████████████████████████████████████████████████████████| 1024/1024 [00:21<00:00, 48.13it/s]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[38], line 11
8 from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
10 # Call the method with adjusted file paths
---> 11 cpdb_results = cpdb_statistical_analysis_method.call(
12 cpdb_file_path=cpdb_file_path,
13 meta_file_path=meta_file_path,
14 counts_file_path=counts_file_path,
15 counts_data='hgnc_symbol',
16 # Omitting the optional files active_tfs_file_path and microenvs_file_path
17 score_interactions=True,
18 iterations=1000,
19 threshold=0.1,
20 threads=5,
21 debug_seed=42,
22 result_precision=3,
23 pvalue=0.05,
24 subsampling=False,
25 subsampling_log=False,
26 subsampling_num_pc=100,
27 subsampling_num_cells=1000,
28 separator='|',
29 debug=False,
30 output_path=out_path,
31 output_suffix=None
32 )
File ~\anaconda3\Lib\site-packages\cellphonedb\src\core\methods\cpdb_statistical_analysis_method.py:157, in call(cpdb_file_path, meta_file_path, counts_file_path, counts_data, output_path, microenvs_file_path, active_tfs_file_path, iterations, threshold, threads, debug_seed, result_precision, pvalue, subsampling, subsampling_log, subsampling_num_pc, subsampling_num_cells, separator, debug, output_suffix, score_interactions)
154 if score_interactions:
155 # Make sure all cell types are strings
156 meta['cell_type'] = meta['cell_type'].apply(str)
--> 157 interaction_scores = scoring_utils.score_interactions_based_on_participant_expressions_product(
158 cpdb_file_path, counts4scoring, means_result.copy(), separator, meta, threshold, "cell_type", threads)
159 analysis_result['interaction_scores'] = interaction_scores
161 file_utils.save_dfs_as_tsv(output_path, output_suffix, "statistical_analysis", analysis_result)
File ~\anaconda3\Lib\site-packages\cellphonedb\utils\scoring_utils.py:344, in score_interactions_based_on_participant_expressions_product(cpdb_file_path, counts, means, separator, metadata, threshold, cell_type_col_name, threads)
340 cpdb_fms = scale_expression(cpdb_fmsh,
341 upper_range=10)
343 # Step 5: calculate the ligand-receptor score.
--> 344 interaction_scores = score_product(matrix=cpdb_fms,
345 means=means,
346 separator=separator,
347 interactions=interactions,
348 id2name=id2name,
349 threads=threads)
350 return interaction_scores
File ~\anaconda3\Lib\site-packages\cellphonedb\utils\scoring_utils.py:290, in score_product(matrix, interactions, means, separator, id2name, threads)
288 for ct_pair, lr_scores_filtered in results:
289 interacting_pair2score = dict(zip(lr_scores_filtered['interacting_pair'], lr_scores_filtered['score']))
--> 290 interaction_scores[ct_pair] = [interacting_pair2score[id] for id in interaction_scores['interacting_pair']]
292 return interaction_scores
File ~\anaconda3\Lib\site-packages\cellphonedb\utils\scoring_utils.py:290, in <listcomp>(.0)
288 for ct_pair, lr_scores_filtered in results:
289 interacting_pair2score = dict(zip(lr_scores_filtered['interacting_pair'], lr_scores_filtered['score']))
--> 290 interaction_scores[ct_pair] = [interacting_pair2score[id] for id in interaction_scores['interacting_pair']]
292 return interaction_scores
KeyError: 'VEGFA_FLT1_complex'
Hi PattF,
Thank you for using CellphoneDB and for your inquiry. Would you mind sending a link to the files you used in the analysis to contact@cellphonedb.org? I will then take a closer look and get back to you. Many thanks.
Best,
Robert.
Thanks Robert! Just sent an email with the requested files.
best,
Patrick
Hi Patrick,
Thanks for sharing your input files with us. I notice negative values in your counts file - from this I infer that you may have scaled the counts before analysing them with CellphoneDB. The negative counts values is what is causing the above error. The counts should be normalised but not scaled before submitting to CellphoneDB. Hope this helps.
Best,
Robert.
Thanks for checking Robert! Right, so I've processed my expression matrix input the wrong way. So it can't be scaled, it should be normalized, and what if its been logarithmized as well?
Can the file include any form of preprocessing/filtering (other than normalization)?
Thanks!
Patrick
Hi Patrick,
The error is thrown by the scoring functionality (c.f. score_interactions=True in your cpdb_statistical_analysis_method.call() above). https://cellphonedb.readthedocs.io/en/latest/RESULTS-DOCUMENTATION.html#method-2-statistical-inference-of-interaction-specificity advises you the following: 'To score interactions, CellphoneDB requires log-normalized expression data, any normalisation procedure (i.e. z-scaling) that transforms zeros to any other value must be avoided.'
Essentially your counts data cannot be negative, e.g. Seurat's LogNormalize function (see: https://satijalab.org/seurat/reference/normalizedata) outputs non-negative counts.
Best,
Robert.
Hi Robert,
Apologies for the late reply, thought I had posted a response.
Can I send an email with the output I generated and also ask some questions about how to setup the initial run?
Happy holidays!
best,
Patrick
Hi Patrick,
I'm afraid I may not be able to comment on any steps prior to CellphoneDB analysis but feel free to ask me about any issues that occur during the analysis using the package. Hope that helps.
Best,
Robert
Thanks Robert!
I sent you a quick email about it all with a data link.
best,
Patrick
Help was provided to the user on CellphoneDB use via email.