Prioritization Weights Seem Arbitrary
DarioS opened this issue · 1 comments
Why are these values good values? How can parameter tuning be done by a user without having a ground truth data set?
prioritizing_weights_DE = c("de_ligand" = 1, "de_receptor" = 1)
prioritizing_weights_activity = c("activity_scaled" = 2)
prioritizing_weights_expression_specificity = c("exprs_ligand" = 2, "exprs_receptor" = 2)
prioritizing_weights_expression_sufficiency = c("frac_exprs_ligand_receptor" = 1)
prioritizing_weights_relative_abundance = c("abund_sender" = 0, "abund_receiver" = 0)
Hi @DarioS
This is a good question. And short answer is: these weights are arbitrary.
You should see MultiNicheNet as an aid in finding potential relevant cell-cell communication patterns in your biological system. MultiNicheNet will do this by considering several complementary criteria that we think are relevant for cell-cell communication (and can be estimated from transcriptomics data). Given the output, you can then as end-user decide which interactions you want to validate further.
These default weights were chosen because they equally emphasize target gene enrichment, differential expression, and cell-type specific expression. We cannot really say these are "good values". And as you mention, there are indeed no ground truth data sets to train a model on. So, we chose these because they equally emphasize different criteria that we think are relevant for most datasets. An exception here is the expression_sufficiency/frac_exprs_ligand_receptor
weight, which is half the weight of the other criteria - we admit this is very arbitrary and fixed this for v2 (see bottom of the message).
If you would prefer to validate the most strongly differentially expressed ligands/receptors without considering their target gene enrichment or cell-type specificity: you can only keep the prioritizing_weights_DE
. However, we don't recommend this. The same would go for other criteria.
Importantly, for multinichenetr-v2 (currently on dev-branch, soon on main), we changed the code so that users cannot longer directly change the prioritization weights, but rather have to decide between biological "scenarios":
The regular scenario will give equal weights to target gene enrichment, differential expression, and cell-type specific expression again. In contrast to the default values of v1, frac_exprs_ligand_receptor will now get the same weight too.
In addition to the regular scenario, we also added some other scenarios. For example, the scenario = "lower_DE"
is a potential alternative in case your hypothesis is that the differential CCC patterns in your data are less likely to be driven by DE, such as in cases of differential migration into a niche.