/eQTL_to_TWAS

Methods for converting eQTL summary statistics into TWAS SNP-weights

Primary LanguageRGNU General Public License v3.0GPL-3.0

eQTL to TWAS

This repo contains code for deriving TWAS models from eQTL summary statistics. The same code can be used for other molecular features as well (e.g. methylation, protein, chromatin). The models are in FUSION format.

Please refer to and cite our paper comparing approaches for integration of eQTL summary statistics with GWAS summary statistics.


compute_weights.R

This script uses eQTL summary statistics to create TWAS models in a format consistent with FUSION.

The essential software and data requirements vary according to the methods selected to create models. The methods currently implemented are:

  • top1 - Single strongest eQTL
  • PRS-CS - Bayesian polygenic scoring method
  • SBayesR - Bayesian polygenic scoring method, implemented in GCTB
  • DBSLMM - Bayesian polygenic scoring method
  • LDpred2 - Bayesian polygenic scoring method, implemented in bigsnpr
  • lassosum - Lasso-regularised polygenic scoring method
  • SuSiE - SNP fine-mapping software, with polygenic scoring applications

Essential software and data:
  • eQTL summary statistics

    • Must have the following columns: GENE, CHR, SNP, BP, A1, A2, BETA, SE, Z, P, FREQ, N
    • BP must be in genome build GRCh37/hg19
  • Reference plink files

    • For example, 1000 Genomes Phase 3
    • Split by chromosome, containing RSIDs, build GRCh37/hg19
  • R libraries

    install.packages(c('optparse','data.table'))
    
  • GCTB software and GCTB reference data

  • GCTA software


Method specific software and data:
  • PRS-CS

  • DBSLMM

    • DBSLMM software
    • Note. compute_weights.R does not work with the latest version of DBSLMM. You must git reset to a previous branch as follows
    • git reset --hard aa6e7ad5b8a7d3b6905556a4007c4a1fa2925b7d
      
  • LDpred2

  • lassosum

    • install.packages(c('lassosum','fdrtool'))
      
  • SuSiE

    • install.packages('susieR')
      

Example of running the script for a single gene:
Rscript eQTL_to_TWAS/compute_weights.R \
    --id ENSG00000206503 \
    --extract /Data/ldsc/w_hm3.snplist \
    --sumstats /Data/eQTLGen/eQTLGen_sumstats.txt \
    --gcta /Software/gcta_1.94 \
    --gctb /Software/gctb_203 \
    --gctb_ref /Data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr \
    --plink_ref_chr /Data/1KG/Phase3/1KGPhase3.w_hm3.chr \
    --plink_ref_keep /Data/1KG/Phase3/keep_files/EUR_samples.keep \
    --ld_blocks /Data/LDetect/EUR \
    --rscript Rscript \
    --dbslmm /Software/DBSLMM/software \
    --plink /Software/plink1.9 \
    --PRScs_path /Software/PRScs \
    --PRScs_ref_path /Software/PRScs/ldblk_1kg_eur \
    --ldpred2_ref_dir /Data/ldpred2_ref \
    --output /Data/eQTL_to_TWAS/Test

This will create a file called 'ENSG00000206503.RDat' in the folder '/Data/eQTL_to_TWAS/Test'. This can be parallelised across genes, or run linearly for all genes in the --sumstats file by removing the --id parameter.