chr1swallace/coloc

Question about sdY

maegsul opened this issue · 1 comments

Hi,

First of all, thanks a lot @chr1swallace again for developing coloc - it has been very useful for us to answer many questions we have. A big thanks! I have a rather theoretical/statistical question regarding the "sdY" paramater that is the population standard deviation of the trait for a quantitative trait.

The vignette here indicates that, if the study standardised their (quantitative) trait to have a variance of 1, we can set sdY to 1. However, I am curious whether this recommendation would be still valid if the study used also a set of covariates in the linear regression model predicting this already standardised quantitative trait?

To give an example: let's say we have a set of standardised gene expression values for geneX across n=100 individuals, and we map cis-eQTL variants near this gene, using covariates such as sex, age, and principal components in linear regression, such as below:

glm(standardised_geneX_expression ~ genotype + sex + age + PC1+ PC2 + PC3, data = example_data_table, family=gaussian(link="identity"))

In this case, would it be still a correct assumption to consider sdY = 1 in coloc.abf function for this eQTL dataset (and providing beta & varbeta as well in the coloc.abf function along with sdY), because the regressed out outcome variable might not have a standard deviation of 1 anymore, and betas & varbetas we obtain (and provide to coloc.abf function) are not informing directly for a standardised geneX expression anymore, but they are informing for this outcome after controlling for sex, age, and principal components?

I was just thinking about it, and I wanted to be on the safe side regarding taking into account sdY parameter correctly. What do you think?

Many thanks,
Fahri