error in check_dataset: duplicated SNPs

Question

error in check_dataset: duplicated SNPs

PyunJung-Min opened this issue a year ago · 4 comments

Hi, thanks for the amazing R package. I am new to COLOC.

I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2).
If i understood correctly,
SNPs in dataset 1 and dataset 2 should be identical, is this right?

So i merged dataset 1 and dataset 2 by rsid.
However, there are multiple ENSG genes matched to one SNP in eQTL summary data.
So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes.

How can i deal with this problem? or am I wrong with dataset editing?

Many thanks in advance

Jungmin

Answer 1 · 2023-08-11T07:15:01.000Z

You need to analyse each gene separately, after all you are testing a separate colocalisation hypothesis for each gene.

…

-- https://chr1swallace.github.io

________________________________ From: PyunJung-Min ***@***.***> Sent: Friday, August 11, 2023 5:12:13 AM To: chr1swallace/coloc ***@***.***> Cc: Subscribed ***@***.***> Subject: [chr1swallace/coloc] error in check_dataset: duplicated SNPs (Issue #128) Hi, thanks for the amazing R package. I am new to COLOC. I'm trying to integrate GWAS summary data (dataset 1) and eQTL summary data (dataset 2). If i understood correctly, SNPs in dataset 1 and dataset 2 should be identical, is this right? So i merged dataset 1 and dataset 2 by rsid. However, there are multiple ENSG genes matched to one SNP in eQTL summary data. So, the merged data (dataset1 and dataset 2) has many duplicated SNPs with differnet ENSG genes. How can i deal with this problem? or am I wrong with dataset editing? Many thanks in advance Jungmin — Reply to this email directly, view it on GitHub<#128>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAQWR2DW2EQDJVJW4SNFALLXUWWJ3ANCNFSM6AAAAAA3MMBY44>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Answer 2 · 2023-08-11T09:10:55.000Z

Thanks for your prompt reply!!

Though my eQTL summary data has 19250 genes.
Is there a smart way to analyse 19250 genes at once, instead of performing "coloc.abf" 19250 times?

Thanks!

Jung-Min

Answer 3 · 2023-08-22T10:47:11.000Z

sorry, no. but you probably don't want to run 19250 genes. You know whether each of them have a significant signal in your region of interest, so can discard the rest

Answer 4 · 2023-08-23T03:06:10.000Z

Thank you for the answer! :)

My goal using COLOC is identifying causal(target) genes by integrating GWAS summary data for disease and eQTL summary data. I like to select target genes with various p-value thresholds. That's why i tried to run COLOC with all 19250 genes..

Could you please advise how to solve this mission?
I would appreciate any comment:)
Many thanks

Jung-Min