/HmnTrimmer

HmnTrimmer: a fast-polyvalent trimmer used for several applications of next-generation sequencing

Primary LanguageHTMLBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

HmnTrimmer

Introduction

A trimmer of reads produced by NGS dedicated for common applications like genomic, transcriptomic, targeted metagenomic and shotgun metagenomic.

Getting Started

Prerequisite

Use software with debian systems :

  • yasm
  • build-essential
  • zlib1g-dev
    GCC used for compilation must be > 4 and < 8.

Test software :

  • python3

Installing

Install first igzip
hmndir=./HmnTrimmer
cd ./lib/igzip-042/igzip && make slib0c
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD

Then
make

Test

make test

Running

Software is available by :
HmnTrimmer [OPTIONS] [TRIMMERS]

Minimal example :

./HmnTrimmer \
  --input-fastq-forward INPUT_FILE \
  --output-fastq-forward OUTPUT_FILE \
  --length-min 50

Commands

Input/Output

Files are indicated with these differents commands :

  --input-fastq-forward INPUT_FILE
  --input-fastq-reverse INPUT_FILE
  --input-fastq-interleaved INPUT_FILE

  --output-fastq-forward OUTPUT_FILE
  --output-fastq-reverse OUTPUT_FILE
  --output-fastq-interleaved OUTPUT_FILE

Discarded sequences are optionnaly output with this command. If sequencing is paired, file produced is interleaved.

  --output-fastq-discard OUTPUT_FILE

Trimmers

Several categories : quality, length and information.
Firstly trimmers based on information are applied, then based on quality finaly based on length.

Quality Tail

Based on a successive number of bases from end of read which are below a cut off.
Two parameters : quality, optionaly the number of bases below the quality firstly indicated (default 1 base) and the length percent cut off request to keep read if it was truncated (default not removed).
Format : <int>:<int>:<int>

  --quality-tail STRING

Quality Sliding Window

Based on a sliding window of bases from end of read which are below a minimal mean.
Two parameters : mean quality and size of window.
Format : <int>:<int>

  --quality-sliding-window STRING

Length Min

Minimal length to keep a read.

  --length-min INTEGER

Information Dust

Based on Dust score.

  --information-dust INTEGER

Performance/Other Options:

Report

Optionaly save a report, with differents statistics. Format Json.

  --output-report OUTPUT_FILE

Threads

Specify number of threads to use.

  --threads 1..8

Reads batch

Reads are read in batch. Defined size of batch.

  --reads-batch 100..50000000

Verbose

Log level to use.

  --verbose 1..6 (error..trace)

Docker

Build Image

docker build -t hmntrimmer:1.0.0 .
To save space, test folder isn't copied in image.

Run

docker run \
    -it \
    --rm \
    -v $PWD:$PWD \
    hmntrimmer:1.0.0 \
    --input-fastq-forward $PWD/test/GoldInput/BIG.R1.fastq \
    --output-fastq-forward $PWD/test/DockerTest.R1.fastq.gz \
    --length-min 50

Built with these main libraries

  • SeqAn - Essential library to work with HTS files, algorithms
  • rapidjson - Read/Write Json files efficiently
  • spdlog - Nice log manager
  • igzip - Very fast deflate algorithm

Versioning

SemVer is used for versioning.

Authors

  • Guillaume Gricourt - Initial work

License

See the LICENSE.md file for details