Unintentional trimming of sequences
Closed this issue · 3 comments
Hi,
I tried to correct the publicly available SRR001665 dataset with lighter using the following command:
nice -10 lighter -r ../SRR001665_1.fastq.gz -r ../SRR001665_2.fastq.gz -k 13 4600000 0.04 -t 64 -od k13/ 2>&1 | tee k13/lighter.log
The correction runs through without problems, but the resulting fastq files have 25 respectively 42 unintentionally trimmed sequences in them like this one:
@SRR001665.72513 071112_SLXA-EAS1_s_4:1:6:808:233 length=36 cor bad_prefix=7 ak
GCGTGCCGAAGTTAGTGGGCCTGGAGAATC
+
IIIIIIIIIIIIIIIIII3?I/_%.IIII_IIC4I'
There are still all 36 quality scores, but the last in this case 6 bases of the sequence have been trimmed.
The output is:
[2016-03-22 16:44:22] =============Start====================
[2016-03-22 16:44:24] Bad quality threshold is "&"
[2016-03-22 16:45:33] Finish sampling kmers
[2016-03-22 16:45:33] Bloom filter A's false positive rate: 0.001899
[2016-03-22 16:47:13] Finish storing trusted kmers
[2016-03-22 16:52:13] Finish error correction
Processed 20816448 reads:
18328409 are error-free
Corrected 3617298 bases(1.453875 corrections for reads with errors)
Trimmed 0 reads with average trimmed bases 0.000000
Discard 0 reads
I'll download that data set and take a look at it.
Thanks for letting me know.
I think I've fixed the bug. Can you pull the new version and give it a try?
Thanks.
Works fine for me now.
Awesome how fast you fixed it.
Thanks