Schork-Lab/mediator-was

How do different normalization steps affect the reliability ratio?

Closed this issue · 2 comments

The GTEX samples were quantile normalized after adjusting for covariates.

For Predixcan: the models were trained on their own data (DGN). Need to digest the literature again to se how they were normalized, but the predicted values (because of the associated beta values) are in a distinct range.

Combining these and looking at the residuals of the predicted GTEX expression, there is definitely a scale issue. I don't have a good intuition for how this biases the results, but there is definitely a large range of values being produced for the reliability ratio: https://github.com/Schork-Lab/mediator-was/blob/master/jupyter/predixcan/gtex_variance.ipynb

Continuing this analysis:

Expression = Ebaseline + Ecovariates + Ecis + Etrans + error

Predixcan was trained on DGN (corrected for hidden covariates using HCP [http://genome.cshlp.org/content/suppl/2013/11/04/gr.155192.113.DC1/Supplemental_Materials.pdf]). Note, they use the trans-eqtl tuning parameters. I don't know how this affects the results. Also, they use the words "normalized", but the Battle et al. just mention correcting and not normalizing. Let's for now assume it's just corrected for covariates. Predixcan gives you cis-expression, which in a sense can be negative if the baseline effect is high and the snps only modulate the expression by lowering it.

Excan = Ecis = Expression - Ebaseline - Ecovariates - error

GTEX expression is quantile normalized after adjusting for covariates. However, no emphasis on baseline expression or genotype-mediated expression is made. As such, it is just

Egtex = Expression - Ecovariates = Ebaseline + Ecis + Etrans + error

Since Ebaseline is just a constant and not dependent on genotypes, it might be kosher to compare the quantile or normalized values for the GTex-measured to Gtex-predixcan.

Tried normalizing in commit 7312e26 , but the Egtex-measured expression incorporates too many sources of expression compared to the Excan. It is not worthwhile to compare the two directly. It's better to get an estimate of the variance of Epredicted through BSLMM rather than ElasticNet