Two basic questions on coloc: (1) only P-value and MAF? (2) why proteome-wide MR still needs eQTL coloc?

Question

Two basic questions on coloc: (1) only P-value and MAF? (2) why proteome-wide MR still needs eQTL coloc?

Opened this issue 21 days ago · 5 comments

jielab commented 21 days ago

Hi, guys:

I have two fundamental question on colocalization.

Below is the Figure 1 of the original coloc paper. So, only P-value is considered for the bayesian test, not BETA and Variance?
For proteome-wide analysis, we are already studying a protein (not an exposure such as BMI). Once we found a causal effect from a protein to an outcome through traditional MR, why do you still need a coloc analysis to test whether some eQTL is involved with this "protein --> outcome" relationship?

Your clarification/teaching would be greatly appreciated!

Jie

Answer 1 · 2024-11-13T07:45:51.000Z

1. No, beta and barbers give a more accurate test 2. If you are using single variant MR, there is a possibility your result is due to LD between your pqtl and outcome variant. Coloc would give h3 in this case. Sent from Outlook for Android<https://aka.ms/AAb9ysg>

…

________________________________ From: Jie Huang ***@***.***> Sent: Wednesday, November 13, 2024 4:11:18 AM To: chr1swallace/coloc ***@***.***> Cc: Subscribed ***@***.***> Subject: [chr1swallace/coloc] Two basic questions on coloc: (1) only P-value and MAF? (2) why proteome-wide MR still needs eQTL coloc? (Issue #175) Hi, guys: I have two fundamental question on colocalization. 1. Below is the Figure 1 of the original coloc paper. So, only P-value is considered for the bayesian test, not BETA and Variance? image.png (view on web)<https://github.com/user-attachments/assets/59fb633e-86f2-45f7-b9e8-9c5b5cc461e7> 2. For proteome-wide analysis, we are already studying a protein (not an exposure such as BMI). Once we found a causal effect from a protein to an outcome through traditional MR, why do you still need a coloc analysis to test whether some eQTL is involved with this "protein --> outcome" relationship? Your clarification/teaching would be greatly appreciated! Jie — Reply to this email directly, view it on GitHub<#175>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAQWR2FQ3F2DNWDAZ2IUAC32ALGONAVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2TIMBZGAZDGOA>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2024-11-13T08:31:57.000Z

Dear Chris:

Thank you very much!

1. I would imagine that BETA and SE would of course offer something more than P-value alone. But at least the Figure 1 of your 2014 paper did not imply that BETA/SE is needed, correct?

I just looked at your Github code, pasted below, I did not see BETA/SE there.

2. I feel that in population genetics LD could be blamed for everything while COLOC could save everything :-). In my view, basically coloc is like checking whether two kids have similar daily regimens, (e.g., the time of getting up, taking school bus, watching TV, taking a dog walk, etc.), in order to determine whether they were born by the same parents or at least live in the same neighbourhood.

For proteome-wide MR, usually people are NOT using a single variant as instrumental variable. Nevertheless, a Lancet 2012 paper did use a single variant within the LIPG gene (https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext) to test the causality of HDL on myocardial infarction (MI). So, you mean this type of MR could be confounded by LD?

Now if I run coloc on LIPG pQTL --> MI, I am testing H3. Instead, if I run coloc on LIPG eQTL --> MI, I am testing H4? I feel this is hard to understand, if it is true. After all, pQTL is the downstream product of eQTL. Furthermore, pQTL is more accurate than eQTL, because eQTL is fake data (from a remote GTeX project) while pQTL is real data (measured on the same individuals for the disease phenotype study).

Your clarification/teaching would be greatly appreciated!

Best regards,
Jie

Answer 3 · 2024-11-13T09:40:13.000Z

1. if you give just p values and MAF, we try and back-calculate (unsigned) beta and se. so beta/se not needed, but better 2. I think what you describe is like correlation. coloc is based on fine mapping methods originally derived here https://pmc.ncbi.nlm.nih.gov/articles/PMC3791416/. it is about causality it tests whether two traits share a causal variant (H4) or have different causal variants (H3). It does not attempt inference on whether one causes the other.

On 13/11/2024 08:32, Jie Huang wrote: Dear Chris: Thank you very much! *1.* I would imagine that BETA and SE would of course offer something more than P-value alone. But at least the *Figure 1* of your 2014 paper did not imply that BETA/SE is needed, correct? I just looked at your Github code, pasted below, I did not see BETA/SE there. image.png (view on web) <https://github.com/user-attachments/assets/c6ecb574-46d7-4a52-89a5-3f579ae75b73> *2.* I feel that in population genetics *LD* could be blamed for everything while *COLOC* could save everything :-). In my view, basically *coloc* is like checking whether two kids have similar daily regimens, (e.g., the time of getting up, taking school bus, watching TV, taking a dog walk, etc.), in order to determine whether they were born by the same parents or at least live in the same neighbourhood. For proteome-wide MR, usually people are NOT using a single variant as instrumental variable. Nevertheless, a Lancet 2012 paper did use a single variant within the */LIPG/* gene (https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext <https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2%2Ffulltext&data=05%7C02%7Ccew54%40universityofcambridgecloud.onmicrosoft.com%7C27d5b96c1f24478db52c08dd03bdb44f%7C49a50445bdfa4b79ade3547b4f3986e9%7C1%7C0%7C638670835505142577%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=ncWQpoOlfMh6RJhAcQU6oo94j%2Fk9dYihxICN1V3dIuw%3D&reserved=0>) to test the causality of *HDL on myocardial infarction (MI)*. So, you mean this type of MR could be confounded by LD? Now if I run coloc on /LIPG/ *pQTL --> MI*, I am testing *H3*. Instead, if I run coloc on /LIPG/ *eQTL --> MI*, I am testing *H4*? I feel this is hard to understand, if it is true. After all, pQTL is the downstream product of eQTL. Furthermore, pQTL is more accurate than eQTL, because eQTL is fake data (from a remote GTeX project) while pQTL is real data (measured on the same individuals for the disease phenotype study). Your clarification/teaching would be greatly appreciated! Best regards, Jie — Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQWR2DYY23NCY2M4JXCTET2AMFBHAVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSHAZDQMZYHE>. You are receiving this because you commented.Message ID: ***@***.***>

--------------zA7A40ZwaoZ0vZAugz4MPTR7 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> 1. if you give just p values and MAF, we try and back-calculate (unsigned) beta and se. so beta/se not needed, but better 2. I think what you describe is like correlation. coloc is based on fine mapping methods originally derived here <a class="moz-txt-link-freetext" href="https://pmc.ncbi.nlm.nih.gov/articles/PMC3791416/">https://pmc.ncbi.nlm.nih.gov/articles/PMC3791416/</a>. it is about causality it tests whether two traits share a causal variant (H4) or have different causal variants (H3). It does not attempt inference on whether one causes the other. <div class="moz-cite-prefix">On 13/11/2024 08:32, Jie Huang wrote: </div> <blockquote type="cite" ***@***.***"> Dear Chris: Thank you very much! 1. I would imagine that BETA and SE would of course offer something more than P-value alone. But at least the Figure 1 of your 2014 paper did not imply that BETA/SE is needed, correct? I just looked at your Github code, pasted below, I did not see BETA/SE there. <a href="https://github.com/user-attachments/assets/c6ecb574-46d7-4a52-89a5-3f579ae75b73" originalsrc="https://github.com/user-attachments/assets/c6ecb574-46d7-4a52-89a5-3f579ae75b73" shash="r4TIrIorK0uqUmBG6uTUQNaYVfCfvP9UDe/Xzg9ZrFKPzm8PtMDDD8f0LFp5k/OohMnksmBg2x0hST4mzVBrAF7M3v+sjxAWwldrLQ21hqQgsaaP2Wp0CE3+iitFo87yTRtHBzw4p2LQGhV1vVHLkvisjqee24bOPjLtoo+DToY=" moz-do-not-send="true">image.png (view on web)</a> 2. I feel that in population genetics LD could be blamed for everything while COLOC could save everything :-). In my view, basically coloc is like checking whether two kids have similar daily regimens, (e.g., the time of getting up, taking school bus, watching TV, taking a dog walk, etc.), in order to determine whether they were born by the same parents or at least live in the same neighbourhood. For proteome-wide MR, usually people are NOT using a single variant as instrumental variable. Nevertheless, a Lancet 2012 paper did use a single variant within the LIPG gene (<a href="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext" originalsrc="https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext" shash="cb97uG2+aFsamn+5abJ99RIUped3rjJ46yAkZvwgI8L0qXGljQQR2MhhQ28Jl852Y6Jt+qMYLClGgFHIOom5sgnu+eQvrOtNic77Etxou5o3rnSfmfH+ZMmAkn0Kp6VAekWHdWIKpvdQqwCIypi6b3P4QW3cVq96+5xuJX7+GGo=" rel="nofollow" moz-do-not-send="true">https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60312-2/fulltext</a>) to test the causality of HDL on myocardial infarction (MI). So, you mean this type of MR could be confounded by LD? Now if I run coloc on LIPG pQTL --> MI, I am testing H3. Instead, if I run coloc on LIPG eQTL --> MI, I am testing H4? I feel this is hard to understand, if it is true. After all, pQTL is the downstream product of eQTL. Furthermore, pQTL is more accurate than eQTL, because eQTL is fake data (from a remote GTeX project) while pQTL is real data (measured on the same individuals for the disease phenotype study). Your clarification/teaching would be greatly appreciated! Best regards, Jie — Reply to this email directly, <a href="#175 (comment)" originalsrc="#175 (comment)" shash="YnT6LVgJnjUlxFbKHIVDNy6ck9AVgE3bYZ6iEN7n9FZ1tTE2Ccnuf7GMRbmj2zytkTqvlp8ws9ATfiNwTsxRncKPG8RVmaBA25fp/Y0wdJ+nLv7B7me9qi6CUUsixAZUPdqTkxa8I7sksPibYInZI5uoYcDfHtHEB7IQEt1WO8s=" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAQWR2DYY23NCY2M4JXCTET2AMFBHAVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSHAZDQMZYHE" originalsrc="https://github.com/notifications/unsubscribe-auth/AAQWR2DYY23NCY2M4JXCTET2AMFBHAVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZSHAZDQMZYHE" shash="odSoOlX9sxBiL8sC/nN8cPUXJ+lzy7NF2M2Rn7gyKAAsOrM3NQR7aSVEJkbpE/itE3qhgc3ibW8dnX/dokpVLcJrQi54xDrvUE+FHDZ4c/2ebDxWO1NLuTqpLOq1W8fd3RWdxkQx4iRAxEeCFujm8uQNcB+q0CIBeesmcClb9sM=" moz-do-not-send="true">unsubscribe</a>. You are receiving this because you commented.<img src="https://github.com/notifications/beacon/AAQWR2FUBLCDFECWCW26GG32AMFBHA5CNFSM6AAAAABRVRPOOSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUTMRO6K.gif" alt="" moz-do-not-send="true" width="1" height="1">Message ID: <chr1swallace/coloc/issues/175/2472828389@github.com> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#175 (comment)", "url": "#175 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

…

--------------zA7A40ZwaoZ0vZAugz4MPTR7--

Answer 4 · 2024-11-14T15:24:40.000Z

Dear Chris:

Thank you very much again for clarification!

Can you please confirm that https://github.com/chr1swallace/coloc/blob/main/R/claudia.R is the source code when I run coloc.abf? I did see both approx.bf.p and approx.bf.estimates. The former used P while the latter use z and V.
The example I gave is NOT describing correlation, but causation. Based on two kids' daily regimen, I am testing whether a common parent / family caused it. I am not testing whether one child is correlated with the other kid.

Best regards,
Jie

Answer 5 · 2024-11-14T17:42:46.000Z

confirmed. as I said. we can use p values. but beta and se (z and V) are more accurate.

On 14/11/2024 15:25, Jie Huang wrote: Dear Chris: Thank you very much again for clarification! 1. Can you please confirm that https://github.com/chr1swallace/coloc/blob/main/R/claudia.R <https://github.com/chr1swallace/coloc/blob/main/R/claudia.R> is the source code when I run *coloc.abf*? I did see both *approx.bf.p* and *approx.bf.estimates*. The former used P while the latter use z and V. 2. The example I gave is NOT describing correlation, but *causation*. Based on two kids' daily regimen, I am testing whether a common parent / family caused it. I am not testing whether one child is correlated with the other kid. Best regards, Jie — Reply to this email directly, view it on GitHub <#175 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAQWR2E7D6G2EIE6NOJE5ED2AS6E5AVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWG4YDCMRXHE>. You are receiving this because you commented.Message ID: ***@***.***>

--------------ZyfVi9MzIm0wBwt05TuAQwXJ Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> confirmed. as I said. we can use p values. but beta and se (z and V) are more accurate. <div class="moz-cite-prefix">On 14/11/2024 15:25, Jie Huang wrote: </div> <blockquote type="cite" ***@***.***"> Dear Chris: Thank you very much again for clarification! <ol dir="auto"> <li> Can you please confirm that <a href="https://github.com/chr1swallace/coloc/blob/main/R/claudia.R" originalsrc="https://github.com/chr1swallace/coloc/blob/main/R/claudia.R" shash="yemfsknUOOIS1+65QfnkVymCSxTub89d44J0/FefvA6dff2DKf48hnVDNDPID4MIp4BCLbZRpSYtpBwMz2ymOvdbcDFBFua6m1low3/ytOIbUxpLXbmRLZleRvyPe4QNe08NOw3EU3CdGhAgbINEClN98ICjQN0btmPMdIbLu7s=" moz-do-not-send="true">https://github.com/chr1swallace/coloc/blob/main/R/claudia.R</a> is the source code when I run coloc.abf? I did see both approx.bf.p and approx.bf.estimates. The former used P while the latter use z and V. </li> <li> The example I gave is NOT describing correlation, but causation. Based on two kids' daily regimen, I am testing whether a common parent / family caused it. I am not testing whether one child is correlated with the other kid. </li> </ol> Best regards, Jie — Reply to this email directly, <a href="#175 (comment)" originalsrc="#175 (comment)" shash="Gs9B1IGk5rM7dBAT3oelcSN6h3TgbkNHjobXL61bBZauuJZrYb5EwMOZ0BT6kQmY3WucuqLVD3CP5X5eUMNJNhzc02iZBMdrg0uWg145FxU6YQtaaub0gu37mr5NmCyZcNtg1wFvbDpq9vyWi5w46kXPnNjTdbfPwSKtXS9ebls=" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAQWR2E7D6G2EIE6NOJE5ED2AS6E5AVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWG4YDCMRXHE" originalsrc="https://github.com/notifications/unsubscribe-auth/AAQWR2E7D6G2EIE6NOJE5ED2AS6E5AVCNFSM6AAAAABRVRPOOSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWG4YDCMRXHE" shash="W++KsMGyZlNz1nSkMzbpJH6yciv2zyzaE/mM55OV+WNfd1msMMSTqhaQNpai/om5zrXa/fi5UdAjf2zogBxjlf6FA3drKVpX3pbJsjC0Z+cUjSHsnmfBvgkxvqXjujHwToFbFL+x9kCsK83I91bCclUPVV/Uwp8XSypfteuhgMM=" moz-do-not-send="true">unsubscribe</a>. You are receiving this because you commented.<img src="https://github.com/notifications/beacon/AAQWR2FRYHDRWBS2T6OKKUD2AS6E5A5CNFSM6AAAAABRVRPOOSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTUTT53F6.gif" alt="" moz-do-not-send="true" width="1" height="1">Message ID: <chr1swallace/coloc/issues/175/2476701279@github.com> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#175 (comment)", "url": "#175 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

…

--------------ZyfVi9MzIm0wBwt05TuAQwXJ--