eQTL to TWAS

This repo contains code for deriving TWAS models from eQTL summary statistics. The same code can be used for other molecular features as well (e.g. methylation, protein, chromatin). The models are in FUSION format.

Please refer to and cite our paper comparing approaches for integration of eQTL summary statistics with GWAS summary statistics.

compute_weights.R

This script uses eQTL summary statistics to create TWAS models in a format consistent with FUSION.

The essential software and data requirements vary according to the methods selected to create models. The methods currently implemented are:

top1 - Single strongest eQTL
PRS-CS - Bayesian polygenic scoring method
SBayesR - Bayesian polygenic scoring method, implemented in GCTB
DBSLMM - Bayesian polygenic scoring method
LDpred2 - Bayesian polygenic scoring method, implemented in bigsnpr
lassosum - Lasso-regularised polygenic scoring method
SuSiE - SNP fine-mapping software, with polygenic scoring applications

Essential software and data:

eQTL summary statistics
- Must have the following columns: GENE, CHR, SNP, BP, A1, A2, BETA, SE, Z, P, FREQ, N
- BP must be in genome build GRCh37/hg19
Reference plink files
- For example, 1000 Genomes Phase 3
- Split by chromosome, containing RSIDs, build GRCh37/hg19

R libraries

install.packages(c('optparse','data.table'))

GCTB software and GCTB reference data
GCTA software

Method specific software and data:

PRS-CS
- PRS-CS software
- PRS-CS reference data
DBSLMM
- DBSLMM software
- Note. compute_weights.R does not work with the latest version of DBSLMM. You must git reset to a previous branch as follows
- ```
git reset --hard aa6e7ad5b8a7d3b6905556a4007c4a1fa2925b7d
```
LDpred2
- LDpred2 reference data
- ```
install.packages(c('bigsnpr','dplyr')
```

lassosum

install.packages(c('lassosum','fdrtool'))

SuSiE
- ```
install.packages('susieR')
```

Example of running the script for a single gene:

Rscript eQTL_to_TWAS/compute_weights.R \
    --id ENSG00000206503 \
    --extract /Data/ldsc/w_hm3.snplist \
    --sumstats /Data/eQTLGen/eQTLGen_sumstats.txt \
    --gcta /Software/gcta_1.94 \
    --gctb /Software/gctb_203 \
    --gctb_ref /Data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr \
    --plink_ref_chr /Data/1KG/Phase3/1KGPhase3.w_hm3.chr \
    --plink_ref_keep /Data/1KG/Phase3/keep_files/EUR_samples.keep \
    --ld_blocks /Data/LDetect/EUR \
    --rscript Rscript \
    --dbslmm /Software/DBSLMM/software \
    --plink /Software/plink1.9 \
    --PRScs_path /Software/PRScs \
    --PRScs_ref_path /Software/PRScs/ldblk_1kg_eur \
    --ldpred2_ref_dir /Data/ldpred2_ref \
    --output /Data/eQTL_to_TWAS/Test

This will create a file called 'ENSG00000206503.RDat' in the folder '/Data/eQTL_to_TWAS/Test'. This can be parallelised across genes, or run linearly for all genes in the --sumstats file by removing the --id parameter.

opain/eQTL_to_TWAS

eQTL to TWAS

compute_weights.R

Essential software and data:

Method specific software and data:

Example of running the script for a single gene: