WORK IN PROGRESS
We assembled a repository of >400 published GWAS loci for which we have high confidence in the gene functionally implicated. Gold-standard evidence were grouped into 4 classes: (i) GWAS loci that overlap with drug targets-disease pairs; (ii) expert curated loci with strong orthogonal evidence or biological plausibility; (iii) GWAS loci that have been investigated by functional follow up experiments (e.g. Reporter Assays, CRISPR/Cas9 genome editing); (iv) loci inferred from observational functional data (e.g. colocalisation with molecular QTLs and epigenetics marks). We also assigned each gold-standard a confidence rating of high, medium or low depending on our assessment of the strength of supporting evidence.
Repository for GWAS variant to gene gold standards.
gold_standards/processed
contains the latest set of gold standards in a variety of formats. Note, the TSV file has arrays concatenated together using '|' as a seporator.
To do:
- Scripts to produce descriptive statistics for each gold standard set
- Derive new gold standards based on drug and rare disease data
The gold standard schema consists of 5 sections:
sentinel_variant
: Information about the sentinel/lead GWAS varianttrait_info
: Information about the GWAS trait or diseaseassociation_info
: Evidence linking the sentinel variant to the trait through GWASgold_standard_info
: Evidence linking the GWAS signal (variant, trait, association) to a genemetadata
: Additional information
# Sentinel variant information, alleles must match gnomAD
sentinel_variant:
# Locus chrom and position on either GRCh37 or GRCh38 are required
locus_GRCh37:
chromosome: '16'
position: 81264597
locus_GRCh38:
chromosome: '16'
position: 81230992
# Alleles are required
alleles:
alternative: G
reference: T
# rsID is optional
rsid: rs6564851
# Trait/disease information
trait_info:
# List of ontology codes for the trait, should be EFO if possible
ontology:
- HMDB0000561
# Trait reported by the author and standardised trait name (from ontology)
reported_trait_name: Carotenoid and tocopherol levels (beta-carotene)
standard_trait_name: B-Carotene
# Association evidence
association_info:
# List of ancestries in which the association was detected
ancestry:
- EUR
# GWAS Catalog study ID if available
gwas_catalog_id: GCST000324
# Open Targets Genetics study ID if available
otg_id: GCST000324
# Negative log p-value (optional)
neg_log_pval: 23.699
# Pubmed ID or doi
pubmed_id: '19185284'
doi: '10.1101/592238'
# Gold standard evidence
gold_standard_info:
# Ensembl gene ID
gene_id: ENSG00000135697
# List of evidences support link with gene
evidence:
# Item 1 in list
- class: expert curated # See below for evidence classes
# Confidence should be "High" or "Low"
confidence: High
# Evidence curator
curated_by: EF
# Description of evidence
description: BCO1 (previously referred to as BCMO1) encodes beta-carotene oxygenase
1 which uses a molecule of oxygen to produce two molecules of retinol from
beta-carotene. Enzyme deficiency results in accumulation of beta-carotene.
# Pubmed ID or source
pubmed_id: '11401432'
source: ChEMBL drug data
# Metadata
metadata:
date_added: '2019-05-17'
reviewed_by: EM
# Name given to the group of gold standards
set_label: ProGeM
submitted_by: EF
# Additional tags that may be useful for analysis
tags:
- metabolite
- mQTL
comments: 'No comments'
- "expert curated": association curated by an expert
- "functional experimental": association inferred from experimental alteration (intervention), e.g. CRISPR editing
- "functional observational": association inferred from observational evidence, e.g. correlation with quantitative trait such as eQTL or pQTL
- "drug": association inferred from known drug target-indication pairs
- Download the template yaml from here if submitting a single gold standard or here if submitting multiple at once
- Fill in the yaml file
- Create a new issue including the completed yaml file
We will then review the submission and add it to the repository.
The sections contains instructions for validating and processing newly submitted gold standards. Submitters are not required to validate and process new gold stnadards.
# Set up environment
conda env create -n goldstandards --file environment.yaml
conda activate gold_standards
# Validate against schema (input can be json or yaml)
python validation/validator.py \
--input temp/progem/progem.190517.yaml \
--schema validation/goldstandard_schema.v1.4.json
# Convert to json
python utils/json_yaml_converter.py \
--input temp/progem/progem.190517.yaml \
--output temp/progem/progem.190517.json
# Add to `gold_standards/unprocessed_validated`
mv temp/progem/progem.190517.json \
gold_standards/unprocessed_validated/progem.190517.json
# Process all gold standards in `gold_standards/unprocessed_validated`
nano processing/process_and_convert_formats.sh # Edit Args
bash processing/process_and_convert_formats.sh