CAPICE : a computational method for Consequence-Agnostic Pathogenicity Interpretation of Clinical Exome variations

CAPICE is a computational method for predicting the pathogenicity of SNVs and InDels. It is a gradient boosting tree model trained using a variety of genomic annotations used by CADD score and trained on the clinical significance. CAPICE performs consistently across diverse independent synthetic, and real clinical data sets. It ourperforms the current best method in pathogenicity estimation for variants of different molecular consequences and allele frequency.

The software can be used as web service, as pre-computed scores or by installing the software locally, all described below.

Use online web service

CAPICE can be used as online service at http://molgenis.org/capice

Download files of precomputed scores for all possible SNVs and InDels (based on GrCh37)

We precomputed the CAPICE score for all possible SNVs and InDels. It can be downloaded via zenodo.

The file contains the following columns: #CHROM chromosome name, as [1:22, X] POS genomic position (GrCh37 genome assembly) REF reference allele ALT alternative allele score CAPICE score. The score ranges from 0 to 1, the higher the more likely the variant is pathogenic

Install CAPICE software locally

The CAPICE software is also provided in this repository for running CAPICE in your own environment. The following sections will guide you through the steps needed for the variant annotation and the execution of making predictions using the CAPICE model.

Requirements

Python 3.6 (doesn't work with 3.7 or 3.8)

Downloads, installation and processing of the input files

Software and libraries CAPICE scripts can be downloaded from the CAPICE github repository. The CAPICE model can be downloaded via #tbd

git clone https://github.com/molgenis/capice.git
cd capice

Variant annotation and input file format CAPICE uses the same set of features used in CADD. In this repository we also provide an example input variant list in CAPICE_example/test_input.vcf and the annotated input file in CAPICE_example/test_caddAnnotated.tsv.gz
Perform prediction Once the annotated file is ready then the last step would be using the pre-trained model provided in the github repository.

bash predict.sh \
/path/to/input \
/path/to/CAPICE_model \
/path/to/output \
/path/to/log_file