RabbitQCPlus (the successor of RabbitQC) has been released recently. We have solved the main performance issues in processing gzip-compressed files and performing the time-consuming over-representation analysis. The new repo is available at https://github.com/RabbitBio/RabbitQCPlus.
A tool designed to provide high-speed scalable quality control for sequencing data which can take full advantage of modern hardware. It includes a variety of function modules and supports different sequencing technologies (Illumina, Oxford Nanopore, PacBio). RabbitQC achieves speedups between one and two orders-of-magnitude compared to other state-of-the-art tools.
- For single end data (not compressed)
rabbit_qc -w nthreads -i in.fq -o out.fq
- For paired end data (gzip compressed)
rabbit_qc -w nthreads -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz
rabbit_qc -w nthreads -D -i in.fq
A more efficient strategy to process large gzip compressed FASTQ files is to decompress files using pugz and then process them using RabbitQC. Pugz has been integrated into RabbitQC project.
cd RabbitQC/pugz && make asserts=0
./gunzip -t nthreads in.fq.gz
For more help information, please refer to rabbit_qc -h
.
If -w
opition is not specified, RabbitQC will set working thread number to total CPU cores - 2.
By default, the HTML report is saved to RabbitQC.html
(can be specified with -h
option), and the JSON report is saved to RabbitQC.json
(can be specified with -j
option).
RabbitQC suports all fastp options for short read quality control and all NanoQC optiions for long read quality control. For details please refer to fastp and NanoQC.
RabbitQC
creates reports in both HTML and JSON format.
For Linux and OSX:
cd RabbitQC && make
For Windows:
We provide a prebuild binary for x64 windows (tested on 64bit Windows 10) here. Or you can build RabbitQC using MYSY2.
cd RabbitQC && make
Zekun Yin, Hao Zhang, Meiyang Liu, Wen Zhang, Honglei Song, Haidong Lan, Yanjie Wei, Beifang Niu, Bertil Schmidt, Weiguo Liu, RabbitQC: High-speed scalable quality control for sequencing data, Bioinformatics, , btaa719, https://doi.org/10.1093/bioinformatics/btaa719