jcolinge/BulkSignalR

Potential bug in tg.corr values

Closed this issue · 1 comments

First, thanks so much for this package! I have been using it, and am very excited about the prospects. I am impressed with your use of KEGG directional association alongside REACTOME/GOBP gene sets, this is a quite clever idea.

I noticed unexpected output, and think I discovered a bug. I'm not expert in your code, so I apologize that I could be (1) wrong, or (2) unclear the impact on the overall results. For now, my basic understanding is that the "overall results" appear to be correct best I can tell, and the bug might be limited only to the reported tg.corr values associated with target genes.

I'll skip ahead to where I think the issue might arise. Below is an excerpt from the function .downstreamRegulatedSignaling() in the regulatedInference.R file:

pv <- stats[target.genes, "pval"]
o <- order(pv, decreasing=TRUE)
pv <- pv[o]
target.genes <- target.genes[o]
c <- corrg[r, target.genes]
c <- c[o]

It looks like the variable c is being ordered twice. The target.genes are already re-ordered with target.genes[o] so the values c should already be in the proper order. Calling c[o] makes the correlations essentially random, very much out of order with the genes.

For what it's worth, I tested the change myself and seem to confirm that it fixed the problem.

Here is how I worked around the issue, just for clarity:

pv <- stats[target.genes, "pval"]
o <- order(pv, decreasing=TRUE)
pv <- pv[o]
target.genes <- target.genes[o]
c <- corrg[r, target.genes]
# c <- c[o]

Aside: The symptom was seen because some entries in my data represent two genes "HSP90AA1,HSP90AB1", so by convention we split the data into two identical rows with rownames "HSP90AA1" and "HSP90AB1". These two rows have identical expression, but I noticed the tg.corr values were -0.122 and 0.483... which of course seemed unexpected, since they have identical expression. Anyway, after the change, they both report tg.corr correlation values 0.306 and 0.306, which makes more sense to me.

I don't know where else the tg.corr values are used. I think any downstream steps that use tg.corr are likely to be incorrect.

Thank you for your attention to the issue, and I hope this helps your work! And thanks again for putting this R package together.

Dear James,

First of all, thank you very much for you interest in BSR.

Yes, unfortunately, the mistake you spotted is true. I have just corrected the code on GitHub. This being said, the computed correlations are for information only in the case of the regulated (cluster-based) scoring. That is this little bug had no impact on any P-value or whatever computation. The code taking care of the default usage based on correlation analysis did not have this bug.

Thanks for letting us know.

Best,
Jacques