The weight on enrichment score calculation
rivers-yao opened this issue · 3 comments
Hi, I am trying to understand GSEA algorithm by your code .
I found that in the code functions.R line 41
es <- (abs(values) ^ power) * weight
is not like the paper's formula where a weight variable is added on .
The raw formula of P_hit(S, i) in the paper is like
es <- (abs(values ^ power)
.
Am I miss something ?
Hello,
Thanks for your concern. The parenthesis make no difference in this case due to the absolute value and power being a non-negative integer. I'll update the documentation on the power variable to make this more clear.
values = seq(-10,10)
power = 3
es1 <- abs(values) ^ power
es2 <- abs(values ^ power)
es1 == es2
Feel free to let me know if you have any other questions.
Best,
Jean
Let me make my question clear.
The weight argument does not exist in the formula [1] which is used to calculate the P_hit(S,i) in the paper while in your gsea() function a weight vector can be provided by user for each gene to calculate the P_hit(S,i) . Is that right?
Best,
Yao
Hi Yao,
Ah, thanks for the clarification. You're correct the weighted statistic is not described in the original manuscript. It is described in the GSEA user guide and is an available option in a downloaded GUI: http://www.broadinstitute.org/cancer/software/gsea/wiki/index.php/FAQ#What_is_the_difference_between_the_weighted_statistic_and_the_classic_statistic.3F_Which_should_I_use.3F
In the assessment of the significance of ES, the original manuscript also describes a full permutation of labels and recompute of all differential expression statistics ie. the ranked values, while Liger only permutes labels on the values. A good explanation can be found here: http://baderlab.org/CancerStemCellProject/VeroniqueVoisin/AdditionalResources/GSEA
Best,
Jean