This pipeline facilitates easy usage of coloc
(Giambartolomei et al. 2014, Wallace 2020) with GWAS and eQTL data.
Colocalization analysis is used to detect genetic causality between two different GWAS traits.
Coloc-wrapper performs genetic colocalization analysis for GWAS and eQTL datasets in a given region using coloc.abf()
function from Coloc R-package. It calculates posterior probabilities for the following five hypothesis for each gene in the region under the assumption of a single causal variant for each trait:
: no association
: association to trait 1 only
: association to trait 2 only
: association to both traits, distinct causal variants
: association to both traits, shared causal variant
The posterior probability of hypothesis 4, PP4, determines the possible colocalization. A common threshold for it is PP4 > 0.8.
To get started, look at this minimal example.
- R version >3.6.2
- R-packages: "coloc", "data.table", "ggplot2", "optparse", "R.utils"
- tabix
sudo docker build -t coloc-wrapper -f docker/Dockerfile .
sudo docker run -it -v /mnt/disks/1/projects/COLOC:/COLOC -w /COLOC coloc-wrapper /bin/bash
The input files are the following:
eQTL data can be found here: https://www.ebi.ac.uk/eqtl/
Running coloc-wrapper involves two steps:
- Trimming data, both GWAS and eQTL, according to a predefined region
- Running coloc
file
: GWAS or eQTL file path or urlregion
: genomic region of interest, formatchr:start-end
out
: output file
Rscript extdata/step1_subset_data.R \
--file=ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/Lepik_2017/ge/Lepik_2017_ge_blood.all.tsv.gz \
--region="1:10565520-10965520" \
--out=tmp.txt
gwas
: GWAS summary statistics file for one regioneqtl
: eQTL summary statistics file for one regionheader_gwas
: header of GWAS file, named vector in quotesheader_eqtl
: header of eQTL file, named vector in quotesinfo_gwas
: options for GWAS dataset, more info heretype
: the type of data in dataset - either "quant" or "cc" to denote quantitative or case-controls
: for a case control dataset, the proportion of samples in dataset that are casesN
: number of samples in the dataset
info_eqtl
: options for eQTL dataset, more info heretype
: the type of data in dataset - either "quant" or "cc" to denote quantitative or case-controlsdY
: for a quantitative trait, the population standard deviation of the trait. if not given, it can be estimated from the vectors of varbeta and MAFN
: number of samples in the dataset
p1
: the prior probability that any random SNP in the region is associated with exactly trait 1p2
: the prior probability that any random SNP in the region is associated with exactly trait 2p12
: the prior probability that any random SNP in the region is associated with both traitslocuscompare_thresh
: PP4 threshold that plots the locuscompare plotsout
: output file
Rscript extdata/step2_run_coloc.R \
--eqtl="extdata/Lepik_2017_ge_blood_chr1_ENSG00000142655_ENSG00000130940.all.tsv" \
--gwas="extdata/I9_VARICVE_chr1.tsv.gz" \
--header_eqtl="c(varid = 'rsid', pvalues = 'pvalue', MAF = 'maf', gene_id = 'gene_id')" \
--header_gwas="c(varid = 'rsids', pvalues = 'pval', MAF = 'maf')" \
--info_gwas="list(type = 'cc', s = 11006/117692, N = 11006 + 117692)" \
--info_eqtl="list(type = 'quant', sdY = 1, N = 491)" \
--p1=1e-4 \
--p2=1e-4 \
--p12=5e-6 \
--locuscompare_thresh=0.8 \
--out="Coloc_example.txt" \
gene_id
: gene identifiernsnps
: number of SNPs included in colocalizationPP.H0.abf
: Posterior probability that neither trait has a genetic association in the regionPP.H1.abf
: Posterior probability that only trait 1 has a genetic association in the regionPP.H2.abf
: Posterior probability that only trait 2 has a genetic association in the regionPP.H3.abf
: Posterior probability that both traits are associated, but with different causal variantsPP.H4.abf
: Posterior probability that both traits are associated and share a single causal variant
For more details to output columns see coloc-package.
An alternative coloc-wrapper: https://github.com/eQTL-Catalogue/colocalisation
Rscript -e 'testthat::test_dir("tests/testthat/")'
- Original coloc paper: Giambartolomei, C., Vukcevic, D., Schadt, E.E., Franke, L., Hingorani, A.D., Wallace, C., Plagnol, V., 2014. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLOS Genetics 10, e1004383. https://doi.org/10.1371/journal.pgen.1004383
- Updates on coloc: Wallace, C., 2020. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLOS Genetics 16, e1008720. https://doi.org/10.1371/journal.pgen.1008720
- Importance of visualizing locus: Liu, B., Gloudemans, M.J., Rao, A.S., Ingelsson, E., Montgomery, S.B., 2019. Abundant associations with gene expression complicate GWAS follow-up. Nat Genet 51, 768–769. https://doi.org/10.1038/s41588-019-0404-0 (see also locuscomparer)