ctlab/fgsea

fgsea gives many highly sign. gene sets for comparison where I expect a few

ms-gx opened this issue · 1 comments

ms-gx commented

I am using fgsea according to this example:
https://crazyhottommy.github.io/scRNA-seq-workshop-Fall-2019/scRNAseq_workshop_3.html#downstream_analysis_of_scrnaseq_data

fgsea version: 1.18.0

I have one treatment (case 1) where I see a strong perturbation of a cell line and another treatment (case 2) where I just see a slight perturbation.

Now I compare the two treatments with a control condition (using wilcoxauc) and feed the rankings to fgsea. Interestingly, for the case 2 (slight perturbation) I see many gene sets which are highly significant whereas for the case 1 (high perturbation) I just see a few highly significant gene groups and then the padj drops drastically.

This is the gsea call for both:
fgsea(homo_sapiens_gene_set, stats = ranks, scoreType = "pos", eps = 0.0)

I am using the auc statistics from wilcoxauc.

It seems to me that the signal for the slight perturbation is much more subtle and thus the weak "background" signal gets much more prominent. Or differently: there is no clear and dominant signal for the weak perturbation and fgsea finds lots of noise. At least that's how it looks to me.

Am I doing anything wrong or do I have to adjust something?

Would you suggest something else than auc statistics?

EDIT: would you recommend logFC instead?

I think I didn't get you experimental design. What are you comparing with what? In any case, first I'd suggest to visualize your enrichment, this will help understand why a particular pathway is deemed to be enriched. Second, for single-cell RNA-seq, if you compare one cluster vs other, there is a recommendation to use logFC as a statistic, see #50