fastq-mcf: Allow merging multiple FASTQ input files to one (or two in case of paired-end) cleaned fastq file
Closed this issue · 4 comments
GoogleCodeExporter commented
It would be nice if fastq-mcf could write multiple input FASTQ files to one
cleaned file (single-end sequencing) or to only two output FASTQ files (forward
and reverse), so it is possible to clean FASTQ files from the same sample
located in multiple lanes at once and only have one (or two) FASTQ files, so
you don't need to create a merged cleaned FASTQ file afterwards yourself.
Also is this TODO still valid? http://code.google.com/p/ea-utils/wiki/FastqMcf
====================
When discarding one read for being "too short", it has to discard both pairs.
For a sequencing run of normal quality this is not an issue. It should, though,
write "un-mated" reads (whose mate was skipped) to a separate file. Typically,
since these read mates were poor quality, it's not really useful... but it can
be for diagnostics. I've seen runs where these provide valuable data.
====================
When cleaning paired-end reads, the cleaned forward and reverse FASTQ files
seem to have the same number of reads.
It would also be nice to have the number of reads in the cleaned files itself
in the statistics info:
Kept reads = ((Total reads) - (Too short after clip))
Files: 2
Total reads: 46911033
Too short after clip: 1420553
Clipped 'end' reads (RNAseq.R1.fastq.gz): Count 1369131, Mean: 23.85, Sd: 15.35
Trimmed 5391478 reads (RNAseq.R1.fastq.gz) by an average of 17.01 bases on
quality < 15
Clipped 'end' reads (RNAseq.R2.fastq.gz): Count 1113190, Mean: 16.53, Sd: 8.83
Trimmed 9609085 reads (RNAseq.R2.fastq.gz) by an average of 26.38 bases on
quality < 15
Original issue reported on code.google.com by hulselma...@gmail.com
on 9 Oct 2013 at 1:21
GoogleCodeExporter commented
If you have, say, 2 fastq's , you can do this:
rm fif; mkfifo fif;
( gunzip -c fq1.gz >> fif; gunzip -c fq1.gz >> fif ) &
fastq-mcf adap.txt fif > out
Unlike much bioinformatics software, fastq-mcf is "stream friendly" ....
meaning it buffers what it needs during initial sampling steps.
Original comment by earone...@gmail.com
on 9 Oct 2013 at 2:41
GoogleCodeExporter commented
Original comment by earone...@gmail.com
on 9 Oct 2013 at 2:41
- Changed state: WontFix
GoogleCodeExporter commented
Thanks.
Thanks for this very useful tip.
It can also be done without explicitly making named pipes, but by using process
substitution:
http://tldp.org/LDP/abs/html/process-sub.html
fastq-mcf \
-o cleaned.R1.fq.gz \
-o cleaned.R2.fq.gz \
adapters.fa \
<(zcat uncleaned.lane1.R1.fq.gz uncleaned.lane2.R1.fq.gz;) \
<(zcat uncleaned.lane1.R2.fq.gz uncleaned.lane2.R2.fq.gz;)
fastq-mcf \
-o cleaned.R1.fq.gz \
-o cleaned.R2.fq.gz \
adapters.fa \
<(gunzip -c uncleaned.lane1.R1.fq.gz uncleaned.lane2.R1.fq.gz;) \
<(gunzip -c uncleaned.lane1.R2.fq.gz uncleaned.lane2.R2.fq.gz;)
Can this info being added to the wiki?
http://code.google.com/p/ea-utils/wiki/FastqMcf
Thanks
Original comment by hulselma...@gmail.com
on 10 Oct 2013 at 8:05
GoogleCodeExporter commented
Cool! I didn't even know how process substitution worked.
Original comment by earone...@gmail.com
on 10 Oct 2013 at 2:24