MAF score from GWAS results using COLOC.susie as well as LD matrix question
HKJ396 opened this issue · 12 comments
Hello,
Thank you so much for developing this package. I have my GWAS summary data filtered so it's just the top hits, GTEx eqtl data for the relevant tissue, and an LD matrix. However, I have a couple of questions regarding the coloc.susie function.
First of all, when I supply MAF scores for the GWAS summary results, is this the minor allele frequency or just the frequency of the associated allele which might not necessarily be the minor allele? For example, when I look at my plink allele freq output I might be given a frequency that is above 0.5 for one allele and for another allele I might be given a value below 0.5? For example,
Here’s the first lines of the file:
#CHROM ID REF ALT ALT_FREQS OBS_CT
1 rs376342519:10616:CCGCCGTTGCAAAGGCGCGCCG:C CCGCCGTTGCAAAGGCGCGCCG C 0.995708 3862
1 1:54712:T:TTTTC T TTTTC 0.595903 3862
1 rs368808541:603010:C:A C A 0.00194132 3862
The frequency is given for C – 0.995… Whereas on db SNP the MAF is ~ 0.1 for CCGCCGTTGCAAAGGCGCGCCG
For the third line rs368808541 – the MAF matches what’s on dbSNP. Therefore, I'm not sure if this is the values I should use in the coloc function? Or should I do 1-(any value above 0.5) to find the minor allele?
I have also calculated an LD matrix for my top SNPs - using the PLINK --r2 function so that only SNPs within 1Mb of my top SNPs and r2 above 0.6 is present. Is this okay? I notice the manual states to use --r function rather than r2 if the datasets have different LD however I will only match up the SNPs that are common in both so I assuming they will have the same LD?
Thank you in advance.
Thanks for your quick response. So I will provide my full set of GWAS summary results. I will recreate my LD matrix by supplying the --r (raw inter-variant allele count correlations) rather than r2 (which reports squared correlations). Thanks! I will recalculate MAF so 1-(0.5 or higher).
Sorry but I have another question, when I recreate the LD matrix, shall I use the full set of GWAS summary results or filtered by p value?
When I use the plink command I have set the window size as --ld-window-kb 1000. Is this okay?
Because I'm not doing r2 now, shall I leave setting the r2 value? As in, I set the r2 value above 0.6 before so I am assuming I am not filtering above a particular r2 value now?
Thank you! So I have provided the full list of SNPs (roughly 7 million) that was used in the GWAS as an input file to calculate r ( which gives me raw inter-variant allele count correlations). The only filter I used was kb window. Is this okay?
So I supply my full set of GWAS results (all the SNPs used in the GWAS) and when you say signal is this the top hits e.g. (all the genome wide significance hits/p value below a 10-6 for example if i haven't got genome wide hits). I then provide a window either side of the top hits? Sorry I am really confused.
I really do apologise. I'm very new to all of this. I've realised rather than supplying GWAS results with filtered p value. I have to provide my top hits (with the lowest p value) which I assume is what sentinel means but also the SNPs around these peaks e.g. 1MB around the peak SNPs? I then cross check these SNPs with the eQTL data? So only include the same SNPs between these two datasets. And then I provide a LD matrix.