This repository is a fork of the latest version of ExpansionHunter (v5.0.0) and includes the following modifications:
- By default, it treats the following errors as warnings and does not terminate the program: "Error loading locus x: Flanks can contain at most 5 characters N but found x Ns.", "Invalid contig name." and "Unable to extract X from Y.".
- The VCF files now include an
SVTYPE=STR
info field for each locus (was absent in the original program). - Users can specify the sample ID from the command line using the
--sample-id
parameter, e.g.,--sample-id ABCD1234
. By default, if not specified, the sample ID is derived from the file name. - Compiling the program uses updated libraries that should resolve some issues when building the application that happened with older ones.
- Tools for ExpansionHunter repository has scripts for annotating VCF files with disease information as well as converting BED files into variant catalogues.
- STRipy's ExpansionHunter Results Analyzer allows to easily and quickly assess results from the ExpansionHunter's output files.
- STRipy's ExpansionHunter Catalogue Creator helps to create custom variant catalogues for different referent genomes.
There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.
Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.
Linux and macOS operating systems are currently supported.
Expansion Hunter is provided under the terms and conditions of the Apache License Version 2.0. It relies on several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.
Installation instructions, usage guide, and description of file formats are contained in the docs folder.
- A genome-wide STR catalog containing polymorphic repeats with similar properties to known pathogenic and functional STRs
- REViewer, a tool for visualizing alignments of reads in regions containing tandem repeats
The method is described in the following papers:
-
Egor Dolzhenko, Joke van Vugt, Richard Shaw, Mitch Bekritsky, and others, Detection of long repeat expansions from PCR-free whole-genome sequence data, Genome Research 2017
-
Egor Dolzhenko, Viraj Deshpande, Felix Schlesinger, Peter Krusche, Roman Petrovski, and others, ExpansionHunter: A sequence-graph based tool to analyze variation in short tandem repeat regions, Bioinformatics 2019