program: fqtool
version: 0.0.0
updated: 14:53:20 May 15 2019
Usage: fqtool [OPTIONS]
Options | Explanations |
---|---|
-h,--help | Print this help message and exit |
IO Options: | |
-i FILE REQUIRED | read1 input file name |
-o TEXT REQUIRED | read1 output file name |
-I FILE Needs: -i Excludes: --in_fq_interleaved | read2 input file name |
-O TEXT Needs: -I | read2 output file name |
--unpaired_read1 TEXT | output read1 whose mate failed QC |
--unpaired_read2 TEXT | output read2 whose mate failed QC |
--failed_out TEXT | output failed QC reads |
--phred64 | input fastq is phred64 |
-z INT in [1 - 9] | gzip output compress level |
--in_fq_interleaved Excludes: -I | input fastq interleaved |
Merge: | |
-m Needs: -I Excludes: -s -S | merge overlapped readpair |
--discard_unmerged Needs: -m | discard unmerged reads |
--merge_output TEXT Needs: -m | merged output |
Duplication: | |
-d | enable duplication analysis |
--dup_ana_key_len INT in [12 - 31]=12 Needs: -d | duplication analysis key length |
--dup_ana_hist_size INT in [1 - 10000]=32 Needs: -d | duplicate analysis hist size |
Adapter: | |
-a | enable adapter trimming |
--adapter_of_read1 TEXT Needs: -a | adapter of read1 |
--adapter_of_read2 TEXT Needs: -a | adapter of read2 |
--detect_pe_adapter Needs: -I | detect PE adapters |
Trim: | |
-f INT in [0 - 1000]=0 | bases trimmed in read1 front |
-t INT in [0 - 1000]=0 | bases trimmed in read1 tail |
-b INT in [0 - 1000]=0 | read1 max length allowed |
-F INT in [0 - 1000]=0 | bases trimmed in read2 front |
-T INT in [0 - 1000]=0 | #bases trimmed in read2 tail |
-B INT in [0 - 1000]=0 | read2 max length allowed |
PolyX: | |
-g | enable polyG trim |
--min_len_detect_polyG INT=10 Needs: -g | minimum length to detect polyG |
--max_mismatches_polyG INT=1 Needs: -g | maximum mismatches allowed for matched polyG |
--one_mismatch_each_polyG INT=10 Needs: -g | allowed one mismatch every bases for matched polyG |
-x | enable polyX trim |
--base_to_trim TEXT=ATCGN Needs: -x | nucleotides to trim |
--min_len_detect_polyX INT=10 Needs: -x | minimum length to detect polyX |
--max_mismatches_polyX INT=1 Needs: -x | maximum mismatches allowed for matched polyX |
--one_mismatch_each_polyX INT=10 Needs: -x | allowed one mismatch every bases for matched polyX |
Cut: | |
--enable_cut_front | slide and drop from 5'->3' |
--enable_cut_tail | slide and drop from 3'->5' |
--enable_cut_right | slide from 5'->3' and drop window and right part |
-W INT in [0 - 1000]=4 | window size for cut sliding |
-M INT in [1 - 36]=20 | min mean quality to drop window/bases |
--cut_front_window INT in [0 - 1000]=4 Needs: --enable_cut_front | window size to cut from 5'' |
--cut_tail_window INT in [0 - 1000] Needs: --enable_cut_tail | window size to cut from 3' |
--cut_right_window INT in [0 - 1000]=4 Needs: --enable_cut_right | window size to cut right |
--cut_front_mean_qual INT in [1 - 36]=20 Needs: --enable_cut_front | mean quality to cut from 5' |
--cut_tail_mean_qual INT in [1 - 36] Needs: --enable_cut_tail | mean quality to cut from 3' |
--cut_right_mean_qual INT in [1 - 36]=20 Needs: --enable_cut_tail | mean quality to cut right |
Qual: | |
-q | enable quality filter |
-Q INT in [33 - 75]=20 Needs: -q | minimum ASCII Quality for qualified bases, |
-U INT in [0 - 1]=0.15 Needs: -q | maximum low quality ratio allowed in one read |
-N INT=5 Needs: -q | maximum N bases allowed in one read |
-e FLOAT Needs: -q | average quality needed for one read |
Length: | |
-l | enable length filter |
--min_length INT in [0 - 1000]=15 Needs: -l | min length required for a read |
--max_length INT in [0 - 1000]=0 Needs: -l | max length allowed for a read |
Complexity: | |
-y | enable low complexity filter |
-Y INT in [0 - 1]=0.3 Needs: -y | min complexity required for a read |
Index: | |
--enable_index_filter | enable index filtering |
--index1_file FILE Needs: --enable_index_filter | index1 file to filter |
--index2_file FILE Needs: --enable_index_filter | index2 file to filetr |
--max_diff_for_match INT in [0 - 10]=0 Needs: --enable_index_filter | max ed to validate index matcha |
Correction: | |
-c | enable base correction in PE reads |
--min_overlap_len INT in [0 - 1000]=30 | min overlap length needed for overlap analysis |
--max_diff_for_overlap INT in [0 - 10]=5 | max ed to validate overlap |
UMI: | |
-u | enable UMI preprocess |
--umi_location INT in [1 - 6]=0 Needs: -u | 0[none]1[index1]2[index2]3[read1]4[read2]5[perindex]6[perread] |
--umi_length INT in [0 - 1000]=0 Needs: -u | umi length |
--umi_skip_length INT in [0 - 1000]=0 Needs: -u | bases to skip after umi |
--umi_drop_comment Needs: -u | drop other comment information |
--umi_not_trim Needs: -u | do not trim reads |
ORA: | |
--ora | enable ORA |
--ora_sample INT in [1 - 10000]=20 Needs: --ora | ORA sampling steps |
KMer: | |
--kmer | enable kmer analysis |
--kmer_length INT in [4 - 16]=0 Needs: --kmer | kmer length to analysis |
Report: | |
-J TEXT=report.json | json format report file |
-H TEXT=report.html | html format report file |
System: | |
-w INT in [1 - 16]=4 | worker thread number |
--max_packs_in_repo INT in [1 - 1000000]=1000 | max packs in repo |
--max_item_in_pack INT in [1 - 1000000]=100000 | max read/pairs in pack |
--max_packs_in_mem INT in [1 - 1000000]=5 | max packs in memory |
Split: | |
-s Excludes: -m -S | split output by file number |
--split_file_number INT Needs: -s | total split output file number |
-S Excludes: -m -s | max line of each output file |
--splie_file_line UINT Needs: -S | split output file line limit |
--digits_file_name INT in [1 - 10]=0 | digits for sequential output filename |
Installation
-
clone repo
git clone https://github.com/vanNul/fqtool
-
compile
cd pipe
./autogen.sh
./configure --prefix=/path/to/install/dir/
make
make install
-
execute
/path/to/install/dir/bin/fqtool
-
test
fqtool -i ./testdata/r1.fq.gz -I ./testdata/r2.fq.gz -o r1.out.fq.gz -O r2.out.fq.gz-q --kmer --kmer_length 6 -d -a --detect_pe_adapter > run.log 2>&1
PS
fqtool is modified from fastp with many enchancements, mainly listed below
- all ringbuffer are fixed and are real ring now, so fqtool will not crash with too many reads in input fastq
- use json.hpp and ctml.hpp to write json and html respectly for further customized development with ease
- kmer analysis is disabled by default, and the kmer length used for analysis can be set
- buffer size arguments are externalized as well
- umi use standardized OX:Z tag and umi quality use standarzied BZ:Z to append as comments after readnames
- fastq readnumber estimation and overlap analysis fixedup
- polyG and polyX trimming method corrected and add much more control over what and how to trim
- polyG/X trimming results are also displayed in html report now
- almost all class/functions have been annnotated, haha ~