ctlab/fgsea

Question: unbalanced gene-level statistic warning

TatiKarp opened this issue · 7 comments

Hello! Thank you for the development and maintenance of this project!

When using fgsea function I got a warning:

In fgseaMultilevel(...) :
  There were 1 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 100000)

Increasing the nPermSimple still results in the same warning.
So what exactly is this warning about and why does it prevent the function from calculating the statistics?

Thank you in advance!

When the input statistic is unbalanced (skewed towards positive or negative values) it's hard to accurately estimate two-tailed GSEA p-value. By Subramanian et al it is normalized by the probability of a random gene set to have negative or positive score, which can be very low and hard to estimate for unbalanced statistic values.

How exactly did you get this ranks? Do you want to have a two-tailed test, or a one-tailed test would suffice? One-tailed test could be more and don't have this problem. They can be triggered by setting scoreType="pos" or scoreType="neg" depending whether you are looking for a positive or negative enrichment.

Thank you,
I created the ranks based on a list of differentially expressed genes using this metric: -10logFDR * (+/- from FC).
Based on this ranking I have 8015 positive values and 9141 negative values. Do you think this difference can cause unbalance?

I have a related question about this error (I hope it's ok that I place it here). I am getting this error while using logFC, but in my case the input statistic is pretty unbalanced (80% up; 20% down). My question is why isn't the NES being calculated in this case? Maybe I'm misunderstanding how NES is calculated or how the unbalanced input creates uncertainty.

Just to explain my intended use, I was hoping to use NES as a quantitative score for each gene set and for all of my samples. Essentially, I want to use it to create a sample x gene set matrix. Obviously, I would prefer not to have missing values so I am wondering if the NES can be calculated or assigned some value instead of being missing.

Thank you, I created the ranks based on a list of differentially expressed genes using this metric: -10logFDR * (+/- from FC). Based on this ranking I have 8015 positive values and 9141 negative values. Do you think this difference can cause unbalance?

@TatiKarp I don't think logFDR is a good choice of the metric, try to use logPvalue instead, or statistic value from differential gene expression test

I have a related question about this error (I hope it's ok that I place it here). I am getting this error while using logFC, but in my case the input statistic is pretty unbalanced (80% up; 20% down). My question is why isn't the NES being calculated in this case? Maybe I'm misunderstanding how NES is calculated or how the unbalanced input creates uncertainty.

@shnuggles NES is calculated by dividing the enrichment score of the gene set by the average ES value of the same sign. In your case, I guess, it's hard to generate a sample of gene sets with negative ES values, thus we can't calculate average value and can't calculate NES.

A workaround for that could be to just run a one-tailed test, specifying scoreType="pos", there only positive direction of the enrichment is considered, and it becomes much simple computationally.

@TatiKarp I don't think logFDR is a good choice of the metric, try to use logPvalue instead, or statistic value from differential gene expression test

@assaron thank you for your suggestion. I am using FDR values from differential gene expression test, it will produce the same ranks as nominal p-values. Why do you think this is not the right way?

@shnuggles If I can suggest you something: You can check GSVA analysis to create a sample x gene set matrix. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-7