/ExpansionHunterPlus

Primary LanguageC++Apache License 2.0Apache-2.0

Information about this repository

This repository is a fork of the latest version of ExpansionHunter (v5.0.0) and includes the following modifications:

  1. By default, it treats the following errors as warnings and does not terminate the program: "Error loading locus x: Flanks can contain at most 5 characters N but found x Ns.", "Invalid contig name." and "Unable to extract X from Y.".
  2. The VCF files now include an SVTYPE=STR info field for each locus (was absent in the original program).
  3. Users can specify the sample ID from the command line using the --sample-id parameter, e.g., --sample-id ABCD1234. By default, if not specified, the sample ID is derived from the file name.
  4. Compiling the program uses updated libraries that should resolve some issues when building the application that happened with older ones.

Other useful resources


Expansion Hunter: a tool for estimating repeat sizes

There are a number of regions in the human genome consisting of repetitions of short unit sequence (commonly a trimer). Such repeat regions can expand to a size much larger than the read length and thereby cause a disease. Fragile X Syndrome, ALS, and Huntington's Disease are well known examples.

Expansion Hunter aims to estimate sizes of such repeats by performing a targeted search through a BAM/CRAM file for reads that span, flank, and are fully contained in each repeat.

Linux and macOS operating systems are currently supported.

License

Expansion Hunter is provided under the terms and conditions of the Apache License Version 2.0. It relies on several third party packages provided under other open source licenses, please see COPYRIGHT.txt for additional details.

Documentation

Installation instructions, usage guide, and description of file formats are contained in the docs folder.

Companion tools and resources

  • A genome-wide STR catalog containing polymorphic repeats with similar properties to known pathogenic and functional STRs
  • REViewer, a tool for visualizing alignments of reads in regions containing tandem repeats

Method

The method is described in the following papers: