Hi guys, this feature is revised and improved a lot in fastp v0.19.9 (will be released soon), see the update here:
jrostudent opened this issue · 0 comments
Hi guys, this feature is revised and improved a lot in fastp v0.19.9 (will be released soon), see the update here:
merge paired-end reads
For paired-end (PE) input, fastp supports stiching them by specifying the -m/--merge
option. In this merging
mode:
--merged_out
shouuld be given to specify the file to store merged reads, otherwise you should enable--stdout
to stream the merged reads to STDOUT. The merged reads are also filtered.--out1
and--out2
will be the reads that cannot be merged successfully, but both pass all the filters.--unpaired1
will be the reads that cannot be merged,read1
passes filters butread2
doesn't.--unpaired2
will be the reads that cannot be merged,read2
passes filters butread1
doesn't.--include_unmerged
can be enabled to make reads of--out1
,--out2
,--unpaired1
and--unpaired2
redirected to--merged_out
. So you will get a single output file. This option is disabled by default.
--failed_out
can still be given to store the reads (either merged or unmerged) failed to passing filters.
In the output file, a tag like merged_xxx_yyy
will be added to each read name to indicate that how many base pairs are from read1 and from read2, respectively. For example, @NB551106:9:H5Y5GBGX2:1:22306:18653:13119 1:N:0:GATCAG merged_150_15
means that 150bp are from read1, and 15bp are from read2. fastp
prefers the bases in read1 since they usually have higher quality than read2.
This function is also based on overlapping detection, which has adjustable parameters overlap_len_require (default 30)
and overlap_diff_limit (default 5)
.
Originally posted by @sfchen in #31 (comment)
Hey,
So i'm using kmc to manipulate kmers in PE .fastq files merged & filtered by fastp, it seems that in the merged out file the "merged_xxx_yyy" tag is preventing KMC from reading the file, is there a way to prevent this tag from being added?