JEFworks/liger

 The weight on enrichment score calculation

rivers-yao opened this issue · 3 comments

Hi, I am trying to understand GSEA algorithm by your code .
I found that in the code functions.R line 41
es <- (abs(values) ^ power) * weight
is not like the paper's formula where a weight variable is added on .
The raw formula of P_hit(S, i) in the paper is like
es <- (abs(values ^ power) .
Am I miss something ?

Hello,

Thanks for your concern. The parenthesis make no difference in this case due to the absolute value and power being a non-negative integer. I'll update the documentation on the power variable to make this more clear.

values = seq(-10,10)
power = 3
es1 <- abs(values) ^ power
es2 <- abs(values ^ power)
es1 == es2

Feel free to let me know if you have any other questions.

Best,
Jean

Let me make my question clear.
The weight argument does not exist in the formula [1] which is used to calculate the P_hit(S,i) in the paper while in your gsea() function a weight vector can be provided by user for each gene to calculate the P_hit(S,i) . Is that right?

Best,
Yao

Hi Yao,

Ah, thanks for the clarification. You're correct the weighted statistic is not described in the original manuscript. It is described in the GSEA user guide and is an available option in a downloaded GUI: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#What_is_the_difference_between_the_weighted_statistic_and_the_classic_statistic.3F_Which_should_I_use.3F

In the assessment of the significance of ES, the original manuscript also describes a full permutation of labels and recompute of all differential expression statistics ie. the ranked values, while Liger only permutes labels on the values. A good explanation can be found here: http://baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA

Best,
Jean