eQTL to TWAS
This repo contains code for deriving TWAS models from eQTL summary statistics. The same code can be used for other molecular features as well (e.g. methylation, protein, chromatin). The models are in FUSION format.
Please refer to and cite our paper comparing approaches for integration of eQTL summary statistics with GWAS summary statistics.
compute_weights.R
This script uses eQTL summary statistics to create TWAS models in a format consistent with FUSION.
The essential software and data requirements vary according to the methods selected to create models. The methods currently implemented are:
- top1 - Single strongest eQTL
- PRS-CS - Bayesian polygenic scoring method
- SBayesR - Bayesian polygenic scoring method, implemented in GCTB
- DBSLMM - Bayesian polygenic scoring method
- LDpred2 - Bayesian polygenic scoring method, implemented in bigsnpr
- lassosum - Lasso-regularised polygenic scoring method
- SuSiE - SNP fine-mapping software, with polygenic scoring applications
Essential software and data:
-
eQTL summary statistics
- Must have the following columns: GENE, CHR, SNP, BP, A1, A2, BETA, SE, Z, P, FREQ, N
- BP must be in genome build GRCh37/hg19
-
Reference plink files
- For example, 1000 Genomes Phase 3
- Split by chromosome, containing RSIDs, build GRCh37/hg19
-
R libraries
install.packages(c('optparse','data.table'))
Method specific software and data:
-
PRS-CS
-
DBSLMM
- DBSLMM software
- Note. compute_weights.R does not work with the latest version of DBSLMM. You must git reset to a previous branch as follows
-
git reset --hard aa6e7ad5b8a7d3b6905556a4007c4a1fa2925b7d
-
LDpred2
-
install.packages(c('bigsnpr','dplyr')
-
lassosum
-
install.packages(c('lassosum','fdrtool'))
-
-
SuSiE
-
install.packages('susieR')
-
Example of running the script for a single gene:
Rscript eQTL_to_TWAS/compute_weights.R \
--id ENSG00000206503 \
--extract /Data/ldsc/w_hm3.snplist \
--sumstats /Data/eQTLGen/eQTLGen_sumstats.txt \
--gcta /Software/gcta_1.94 \
--gctb /Software/gctb_203 \
--gctb_ref /Data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr \
--plink_ref_chr /Data/1KG/Phase3/1KGPhase3.w_hm3.chr \
--plink_ref_keep /Data/1KG/Phase3/keep_files/EUR_samples.keep \
--ld_blocks /Data/LDetect/EUR \
--rscript Rscript \
--dbslmm /Software/DBSLMM/software \
--plink /Software/plink1.9 \
--PRScs_path /Software/PRScs \
--PRScs_ref_path /Software/PRScs/ldblk_1kg_eur \
--ldpred2_ref_dir /Data/ldpred2_ref \
--output /Data/eQTL_to_TWAS/Test
This will create a file called 'ENSG00000206503.RDat' in the folder '/Data/eQTL_to_TWAS/Test'. This can be parallelised across genes, or run linearly for all genes in the --sumstats file by removing the --id parameter.