Two basic questions on coloc: (1) only P-value and MAF? (2) why proteome-wide MR still needs eQTL coloc?
Opened this issue · 5 comments
Hi, guys:
I have two fundamental question on colocalization.
-
Below is the Figure 1 of the original coloc paper. So, only P-value is considered for the bayesian test, not BETA and Variance?
-
For proteome-wide analysis, we are already studying a protein (not an exposure such as BMI). Once we found a causal effect from a protein to an outcome through traditional MR, why do you still need a coloc analysis to test whether some eQTL is involved with this "protein --> outcome" relationship?
Your clarification/teaching would be greatly appreciated!
Jie
Dear Chris:
Thank you very much!
1. I would imagine that BETA and SE would of course offer something more than P-value alone. But at least the Figure 1 of your 2014 paper did not imply that BETA/SE is needed, correct?
I just looked at your Github code, pasted below, I did not see BETA/SE there.
2. I feel that in population genetics LD could be blamed for everything while COLOC could save everything :-). In my view, basically coloc is like checking whether two kids have similar daily regimens, (e.g., the time of getting up, taking school bus, watching TV, taking a dog walk, etc.), in order to determine whether they were born by the same parents or at least live in the same neighbourhood.
For proteome-wide MR, usually people are NOT using a single variant as instrumental variable. Nevertheless, a Lancet 2012 paper did use a single variant within the LIPG gene (https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext) to test the causality of HDL on myocardial infarction (MI). So, you mean this type of MR could be confounded by LD?
Now if I run coloc on LIPG pQTL --> MI, I am testing H3. Instead, if I run coloc on LIPG eQTL --> MI, I am testing H4? I feel this is hard to understand, if it is true. After all, pQTL is the downstream product of eQTL. Furthermore, pQTL is more accurate than eQTL, because eQTL is fake data (from a remote GTeX project) while pQTL is real data (measured on the same individuals for the disease phenotype study).
Your clarification/teaching would be greatly appreciated!
Best regards,
Jie
Dear Chris:
Thank you very much again for clarification!
-
Can you please confirm that https://github.com/chr1swallace/coloc/blob/main/R/claudia.R is the source code when I run coloc.abf? I did see both approx.bf.p and approx.bf.estimates. The former used P while the latter use z and V.
-
The example I gave is NOT describing correlation, but causation. Based on two kids' daily regimen, I am testing whether a common parent / family caused it. I am not testing whether one child is correlated with the other kid.
Best regards,
Jie