/AlignerBoost

AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian based mapping quality framework

Primary LanguageJava

AlignerBoost manual

AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian based mapping quality framework.

AlignerBoost works with any NGS aligners that can produce standard SAM/BAM alignment outputs. Currently supported aligners that AlignerBoost has optimized for mapping precision and sensitivity include: DNA aligners: Bowtie, Bowtie2, BWA-ALN/BWA-SW/BWA-MEM, NovoAlign, SeqAlTo RNA aligners: Tophat, Tophat2, STAR

AlignerBoost works by tuning NGS aligners to report all potential alignments, then utilizes a Bayesian-based framework to accurately estimate the mapping quality of ambiguously mapped reads.

AlignerBoost can dramatically increase mapping precision without a significant loss of sensitivity under various experimental strategies.

AlignerBoost is SNP-aware, and higher quality alignments can be achieved if provided with known SNPs.

Download and installation

You can download the latest executable release from GitHub at: https://github.com/Grice-Lab/AlignerBoost/releases. You can also download or fork and pull the source codes from GitHub at: https://github.com/Grice-Lab/AlignerBoost. AlignerBoost is pure Java based, and you can run it without the need for installation on Unix/Linux, Mac OS X, and Windows by simply type "java -jar AlignerBoost.jar" in the shell/terminal.

Dependencies

AlignerBoost does not dependent on any 3rd party library directly. However, if you are using AlignerBoost's best practice to generate executable shell scripts, you do need to have your NGS aligner of choice available in the PATH to be able to run these scripts. You might also need other programs in PATH for some other AlignerBoost pre-processing functionality. See "examples/README.example" for best practice.

Customized SAM format tags

AlignerBoost uses a set of customized tags in generated SAM/BAM files to store auxiliary alignment information calculated during its filter process. These tags are listed below. Note: X?: global tags, Y? seed region related tags, Z?: entire alignment related tags

Tag Type Description

  • XA i alignment length, including M,=,X,I,D,S but not H,P,N
  • XL i insert length, including M,=,X,I,D but not S,H,P,N, determined by Cigar or 1DP
  • XF i actual insert from (start) relative to reference
  • XI f alignment identity as 1 - (YX + YG) / XL
  • XH Z alignment likelihood given this mapping locus and base quality, in string format to preserve double precision
  • XV i known SNVs (if any) used in calculating XH
  • XP Z alignment posterior probability in string format to preserve double precision
  • XT Z genetic type (GTYPE) string generated by 'utils classifySAM'
  • YL i seed length
  • YX i No. of seed mismatches
  • YG i No. of seed indels
  • ZX i No. of all mismatches
  • ZG i No. of all indels

Best practice

To fully utilize AlignerBoost to increase your mapping precision and sensitivity, it is recommended to use our ** Best Practice Pipeline **. Just download our Best Practice Example README and Configuration file, edit the config file using your favorite text/spread-sheet editor, and start your analysis!

QC and pre-processing tools

These are recommended QC and pre-processing procedures that are intended to be called indirectly by the shell scripts generated by the "best practice" steps. Try run java -jar AlignerBoost.jar for details.

Core programs

Core programs are fundamental tools used to pick most probable (highest mapQ) alignments using AlignerBoost's Bayesian framework. Try run java -jar AlignerBoost.jar run for details.

Statistic summary programs

Summary tools recommended during the "best practice" procedures that will generate and subsequently update a tab-delimited report file for runs/libraries processed in a given study. Try run java -jar AlignerBoost.jar stats for details.

Utility program summaries

Utility tools for manipulating common genomic data files, such as SAM/BAM, BED, WIG, VCF/gVCF and more.

  • sam2AbsCover convert a SAM/BAM file to customized tab-delimited coverage file with absolute location coordinates
  • sam2RelCover convert a SAM/BAM file to customized tab-delimited coverage file with relative position coordinates
  • sam2BinCover convert a SAM/BAM file to customized tab-delimited coverage file with binned (%) coordinates
  • sam2RegCount count reads from a SAM/BAM file in given regions from a BED file
  • sam2CoverSumm get simple read cover summary table from a SAM/BAM file
  • sam2Wig convert a SAM/BAM file to UCSC Wiggle file fixed format
  • bed2Wig convert a BED6 file to UCSC Wiggle file fixed format
  • bed2AbsCover convert a BED6 file to customized tab-delimited coverage file with absolute location coordinates
  • filterSamById filter a SAM/BAM file with a given ID list
  • classifySAM fast index-based classify of a SAM/BAM file given genomic annotations from GFF file(s)
  • classifyVCF fast index-based classify of a VCF/gVCF variation file given genomic annotations from GFF file(s)
  • classifyBED fast index-based classify of a BED file given genomic annotations from GFF file(s)
  • filterWigFix filter UCSC Wiggle fixed format file(s) with given regions in BED file
  • filterWigVar filter UCSC Wiggle variable format file(s) with given regions in BED file
  • wigFix2RelCover convert UCSC Wiggle Fixed format file(s) to tax-delimited coverage file in given regions
  • wigVar2RelCover convert UCSC Wiggle Variable format file(s) to tax-delimited coverage file in given regions

Try run java -jar AlignerBoost.jar utils for details.