Web browser: http://cpag.oit.duke.edu/
This repo contains both backend and frontend code of iCPAGdb and the web browser.
iCPAGdb integrates the results of GWAS across different phenotypic scales, identifying and quantifying the significance of pleiotropic loci that impact molecular, cellular, and organismal traits. The goal is to provide a resource that allows experts on a particular human trait to easily develop hypotheses for molecular and cellular phenotypes that underlie the physiology of that trait. Molecules and cellular pathways implicated in this way could serve as novel biomarkers or targets for therapeutic approaches. Current verion of iCPAGdb contains GWAS summary statistic from >4400 diseases/traits, and allows users to explore pre-computed correlations across all existing diseases and/or upload their own GWAS to identify and explore shared SNPs between their own GWAS and >4400 diseases/traits.
This repo contains two parts
-
python (3.6+) code for iCPAGdb
-
R shiny code for Web browser
We added --lddb-r2
parameter to allow users choosing different LD proxy database. However, since the pre-built in GWAS dataset were clumped by PLINK using --clump-r2 0.4
for each study, we recommend to use default parameter: --lddb-r2 0.4
.
- direct download PLINK 1.9, or using Linux/Max
wget
function and place it to folder "plink_bins"
## please choose proper PLINK version, here is an example of Linux version
wget http://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20201019.zip
- download ziped database file (~33 Gb), and decompressed it to "db" folder from Dropbox LINK. Here is an example of downloading the required database using using
wget
on Linux/Mac OS.
wget https://www.dropbox.com/sh/na23jflxcgk0nib/AAAKi--r8cS44U8VboFWBTP2a/cpag_gwasumstat_v1.1.EUR_ld0.4.db --content-disposition
The final folder structure contains all required codes and data file:
pyCPAGdb
├── _utils.py
├── anno_parent_efo.py
├── main.py
├── stats.py
├── plink_bins
│ ├── plink
│ └── prettify
├── db
│ ├── cpag_gwasumstat_v1.1.AFR_ld0.2.db
│ ├── cpag_gwasumstat_v1.1.AFR_ld0.4.db
│ ├── cpag_gwasumstat_v1.1.AFR_ld0.8.db
│ ├── cpag_gwasumstat_v1.1.EAS_ld0.2.db
│ ├── cpag_gwasumstat_v1.1.EAS_ld0.4.db
│ ├── cpag_gwasumstat_v1.1.EAS_ld0.8.db
│ ├── cpag_gwasumstat_v1.1.EUR_ld0.2.db
│ ├── cpag_gwasumstat_v1.1.EUR_ld0.4.db
│ ├── cpag_gwasumstat_v1.1.EUR_ld0.8.db
│ ├── cpag_gwasumstat_v1.2.db
│ ├── gwas-efo-trait-mappings.txt
│ └── lddat
│ ├── AFR_1kg_20130502_maf01.bed
│ ├── AFR_1kg_20130502_maf01.bim
│ ├── AFR_1kg_20130502_maf01.fam
│ ├── EAS_1kg_20130502_maf01.bed
│ ├── EAS_1kg_20130502_maf01.bim
│ ├── EAS_1kg_20130502_maf01.fam
│ ├── EUR_1kg_20130502_maf01.bed
│ ├── EUR_1kg_20130502_maf01.bim
│ └── EUR_1kg_20130502_maf01.fam
- configure computing environment for python 3
The fast way is to install Miniconda and install required package from there.
Create a new environment using conda:
conda create -n icpagdb python=3.7
conda activate icpagdb
install python package using conda:
conda install -c conda-forge panda
conda install -c conda-forge scipy
conda install -c conda-forge joblib
conda install -c conda-forge tqdm
conda install -c conda-forge sqlite
Serum metabolites/xenobiotics (Shin et al. 2014) vs. Human disease
python main.py cpagdb --threads 2 --subtype NHGRI --NHGRI-Pcut 5e-8 \
--subtype BloodMetabolites,BloodXenobiotic --Pcut 1e-5 \
--lddb-pop EUR --outfile NHGRI-p1e-05-BloodMetabolitesXenobiotic-p1e-05-EUR.csv
then annotate phenotype:
python main.py post_analysis --anno-ontology --anno-cols Trait1 \
--infile output/NHGRI-p1e-05-BloodMetabolitesXenobiotic-p1e-05-EUR.csv \
--outfile NHGRI-p1e-05-BloodMetabolitesXenobiotic-p1e-05-EUR.csv
python main.py cpagdb --threads 2 --subtype H2P2 --H2P2-Pcut 1e-7 \
--lddb-pop EUR --outfile output/H2P2-p1e-07-EUR.csv
download COVID-19 GWAS example from "Upload and compute CPAG" page at HERE
python main.py usr-gwas --threads 10 --infile iCPAGdb-Sample-GWAS-top_EllinghausPCs_covid19.csv \
--SNPcol "avsnp150" --delimitor "," --Pcol "p_value" \
--usr-pcut 1e-5 \
--outfile top_EllinghausPCs_covid19_pcut1e-5_icpagdb_out.csv
then annotate phenotype:
python main.py post_analysis --anno-ontology --anno-cols Trait2 \
--infile top_EllinghausPCs_covid19_pcut1e-5_icpagdb_out.csv \
--outfile top_EllinghausPCs_covid19_pcut1e-5_icpagdb_out_addEFO.csv