jrs95/hyprcoloc

Can the correlation matrix use genotypic (LDSC) correlations instead of phenotypic ones?

Closed this issue · 3 comments

I am interested in applying this method of colocalisation to my research. To this end, I have a question surrounding the correlation matrix, no mention is made of whether this should be a phenotypic correlation or a genotypic correlation matrix.
Assumedly this was designed with the intention of it being a phenotypic correlation, however, I am interested in analysing a continuous trait together with a number of binary traits and so a phenotypic correlation may not be particularly useful.
I was wondering if a genotypic correlation from LDSC may be more applicable? As genotypic correlations are typically similar to the phenotypic correlation and may be more applicable as Hyprcoloc is using genetic information for its inferences.
The genetic correlation would of course be genome-wide instead of restricted to the region of interest but I was wondering what the implications of using a genotypic correlation vs a phenotypic correlation might be in terms of adjusting the analysis?

jrs95 commented

Hi,

Apologies for the delayed response - I have recently moved jobs and my github account was still sending notifications to my old University of Bristol address.

The method would work with both. We have only really trialed the method with genotype correlation, because of the reasons you give above, through correlating genome-wide Z-scores from GWAS using the tetrachoric correlation approach. This approach is very similar to LDSC. The nice advantage of using these types of approaches is that they account for sample overlap between the phenotypes as well.

Having said this, in simulations where we induced phenotype correlation caused by sample overlap, the standard model mostly out-performed the model which tries to account for the correlation! So, our advice is to still use the standard model, as the correlation model is much more difficult to fit and also requires LD information to adjust the priors in the presence of a trait correlation matrix (there is a complicated argument as to why this is necessary, but @cnfoley should be able to help if you want more information).

Hope this helps.

Best wishes,

James

jrs95 commented

Hi Nick,

The tetrachoric method is very simple. You just correlate genome-wide indicator variables for the phenotypes where 0 equals the Z-score for that SNP is negative and 1 if it is positive. The reason this is not included in the package is because we would have to devise a mechanism for users to import full genome-wide results, and as we believe the standard model still performs best in this scenario we didn't think this was necessary.

Best wishes,

James