Output fastq file becomes corrupt
pinetree1 opened this issue · 3 comments
I had a pair of fastq files with about 350 million reads or 42 GB each. With NxTrim, the output fastq files became corrupted (gunzip error: unexpected end of file). The bad output fastq files (.gz) had a size between 767426560 and 2168553472. I expected the result files would be much bigger if things worked correctly.
Hi there,
Sorry you are having a problem. It sounds like nxtrim terminated prematurely.
What command did you use and did it run to completion?
You will see a summary (stderr) like this if the tool finished correctly:
Writing to stdout
Trimming:
R1: example/MP_R1.fastq.gz
R2: example/MP_R2.fastq.gz
Trimming summary:
54 / 54 ( 100.00% ) reads passed chastity/purity filters.
0 / 54 ( 0.00% ) reads had multiple copies of adapter (filtered).
0 / 54 ( 0.00% ) read pairs were ignored because template length appeared less than read length
54 remaining reads were trimmed
54 / 54 ( 100.00% ) read pairs had MP orientation
0 / 54 ( 0.00% ) read pairs had PE orientation
0 / 54 ( 0.00% ) read pairs had unknown orientation
0 / 54 ( 0.00% ) were single end reads
0 / 54 ( 0.00% ) extra single end reads were generated from overhangs
Thanks for replying.
My command was: nxtrim -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz --separate --out sample.
It ended peacefully without error. But in the trim log, the last line was: READ PAIR 73980000. No summary was in the log.
I also tried without --separate option and it ended similarly. The last line in log was: READ PAIR 146640000. The largest output fastq file was sample.mp.fastq.gz with 8352284672 byte.
My input fastq files had 418 million pairs of reads. So without --separate option, nxtrim ended when about 1/3 of the job was done and took about 7 hours.
It definitely did not finish, which is why you have truncated gzip files. It is hard to say why the job is dying, but normally we would see some sort of error if there was a bug. Is it possible the process was killed by a job scheduler?
As an aside, you may not want to use the default behavior, which is really tuned for bacterial assembly.
Something like:
nxtrim --stdout -1 1.fastq.gz -2 2.fastq.gz | gzip -1 > trimmed.fq.gz
could me more suitable if you are scaffolding or aligning.