Potential bug in tg.corr values
Closed this issue · 1 comments
First, thanks so much for this package! I have been using it, and am very excited about the prospects. I am impressed with your use of KEGG directional association alongside REACTOME/GOBP gene sets, this is a quite clever idea.
I noticed unexpected output, and think I discovered a bug. I'm not expert in your code, so I apologize that I could be (1) wrong, or (2) unclear the impact on the overall results. For now, my basic understanding is that the "overall results" appear to be correct best I can tell, and the bug might be limited only to the reported tg.corr
values associated with target genes.
I'll skip ahead to where I think the issue might arise. Below is an excerpt from the function .downstreamRegulatedSignaling()
in the regulatedInference.R
file:
pv <- stats[target.genes, "pval"]
o <- order(pv, decreasing=TRUE)
pv <- pv[o]
target.genes <- target.genes[o]
c <- corrg[r, target.genes]
c <- c[o]
It looks like the variable c
is being ordered twice. The target.genes
are already re-ordered with target.genes[o]
so the values c
should already be in the proper order. Calling c[o]
makes the correlations essentially random, very much out of order with the genes.
For what it's worth, I tested the change myself and seem to confirm that it fixed the problem.
Here is how I worked around the issue, just for clarity:
pv <- stats[target.genes, "pval"]
o <- order(pv, decreasing=TRUE)
pv <- pv[o]
target.genes <- target.genes[o]
c <- corrg[r, target.genes]
# c <- c[o]
Aside: The symptom was seen because some entries in my data represent two genes "HSP90AA1,HSP90AB1"
, so by convention we split the data into two identical rows with rownames "HSP90AA1"
and "HSP90AB1"
. These two rows have identical expression, but I noticed the tg.corr
values were -0.122
and 0.483
... which of course seemed unexpected, since they have identical expression. Anyway, after the change, they both report tg.corr
correlation values 0.306
and 0.306
, which makes more sense to me.
I don't know where else the tg.corr
values are used. I think any downstream steps that use tg.corr
are likely to be incorrect.
Thank you for your attention to the issue, and I hope this helps your work! And thanks again for putting this R package together.
Dear James,
First of all, thank you very much for you interest in BSR.
Yes, unfortunately, the mistake you spotted is true. I have just corrected the code on GitHub. This being said, the computed correlations are for information only in the case of the regulated (cluster-based) scoring. That is this little bug had no impact on any P-value or whatever computation. The code taking care of the default usage based on correlation analysis did not have this bug.
Thanks for letting us know.
Best,
Jacques