/vep

Recipe for VEP + Cache

This repository was forked to test workflows developed by Matthias Munz. Further releases/updates are provided by https://github.com/buschlab/vep

Containerized Variant Effect Predictor (VEP) + Cache

Twitter

 + Introduction
 + Building image with Singularity
 + Run VEP
    |-- More options
    |-- Examples
 + Post-processing
    |-- Split VEP
    |-- Filtering by VEP annotations
 + VEP plugins
 + Build & run VEP with Docker
 + Acknowledgements

Introduction

This documentation describes the usage of the Docker image at https://hub.docker.com/r/matmu/vep which contains the bioinformatics tool Ensembl Variant effect predictor (VEP) for annotating genetic variants. The image comes with

  • Merged cache including RefSeq and Ensembl transcripts (VEP parameter --merged required)
  • Reference genome and index
  • Plugins (annotation data is not included)

Available versions

Human

106-GRCh38

105-GRCh38 105-GRCh37
103-GRCh38-merged 103-GRCh38
101-GRCh38
100-GRCh38 100-GRCh38-merged 100-GRCh37 100-GRCh37-merged
99-GRCh38-merged 99-GRCh37-merged

Mouse

105-GRCm39

105-GRCm39
103-GRCm39
101-GRCm38
100-GRCm38 100-GRCm38-merged

The term merged refers to the merged Ensembl/RefSeq cache. To be consistent with the Ensembl website, chose Ensembl cache only (i.e. without the term merged). Examples for available versions are 99-GRCh38 (VEP 99 with Ensembl cache for reference GRCh38) or 99-GRh37-merged (VEP 99 with Ensembl/Refseq cache for reference GRCh37).

You can also visit https://hub.docker.com/r/matmu/vep/tags to get a list of available versions.

Note: If you require a container for a species not mentioned above, feel free to contact us or even better, create an issue.

Build image with Singularity

singularity build vep.<version>.simg docker://matmu/vep:<version>

<version> is a tag representing the Ensembl version and the species + version of the reference genome.

Run VEP

To run VEP execute

singularity exec vep.<version>.simg vep [options]

whereby <version> is replaced by a respective version (see above), e.g. 99-CRCh38. It is essential to add the VEP option --merged when using an image with merged Ensembl/Refseq cache. For species except homo sapiens, also the parameter --species (e.g. --species mus_musculus), has to be set as well.

More options

The options for base cache/plugin directories, species and assembly are set to the right values by default and do not need to be set by the user.

Visit http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html for detailed information about all VEP options. Detailed information about input/output formats can be found at https://www.ensembl.org/info/docs/tools/vep/vep_formats.html#defaultout.

Examples

Minimum (output format: compressed tab delimited)

singularity exec vep.100-GRCh38-merged.simg vep --dir /opt/vep/.vep --merged --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip
singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip
singularity exec vep.100-GRCm38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip -species mus_musculus

Minimum (output format: compressed vcf)

singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.vcf.gz --vcf --compress_output bgzip

Full annotation

singularity exec vep.100-GRCh38.simg vep --dir /opt/vep/.vep --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.vcf.gz --vcf --compress_output bgzip --everything --nearest symbol        

Post-processing

Split VEP

There is a plugin for bcftools that allows to split VEP annotations as well as sample information in a VCF file and convert it to a text file: http://samtools.github.io/bcftools/howtos/plugin.split-vep.html.

Filtering by VEP annotations

If you chose to output the VEP annotations as text file, any command line tool (e.g. awk) or even Excel can be used for filtering the results. For VCF files, the image includes a VEP filtering script which can be executed by

singularity exec vep.<version>.simg filter_vep [options]

Options

Visit https://www.ensembl.org/info/docs/tools/vep/script/vep_filter.html for detailed info about available options.

Filtering examples

Filter for rare variants
singularity exec vep.<version>.simg filter_vep --input_file <filename>.vcf --output_file <filename>.filtered.vcf --only_matched --filter "(IMPACT is HIGH or IMPACT is MODERATE or IMPACT is LOW) and (BIOTYPE is protein_coding) and ((PolyPhen > 0.446) or (SIFT < 0.05)) and (EUR_AF < 0.001 or gnomAD_NFE_AF < 0.001 or (not EUR_AF and not gnomAD_NFE_AF))" 

VEP plugins

VEP allows several other annotations sources (aka Plugins). Their respective Perl modules are included in the image, the annotation files have to be added seperately, however. The list of plugins as well as instructions on how to download and pre-process the annotation files can be found at: http://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html.

singularity exec vep.100-GRCh38-merged.simg vep --dir /opt/vep/.vep --merged --offline --cache --input_file <filename>.vcf[.gz] --output_file <filename>.txt.gz --tab --compress_output bgzip --plugin CADD,/path/to/ALL.TOPMed_freeze5_hg38_dbSNP.tsv.gz

Build and run VEP with Docker

To pull the image and run the container with Docker use

docker run matmu/vep:<version> vep [options]

Unlike Singularity, the directories of Plugin annotation files (e.g. /path/to/dir) have to be explicitely bound to a target directory (e.g. /opt/data) within the container with option -v:

docker run -v /path/to/dir:/opt/data matmu/vep:<version> vep [options]

Acknowledgments

This document has been created by Julia Remes & Matthias Munz, University of Lübeck, Germany.