Unintentional trimming of sequences

Question

Unintentional trimming of sequences

Closed this issue 9 years ago · 3 comments

Hi,

I tried to correct the publicly available SRR001665 dataset with lighter using the following command:
nice -10 lighter -r ../SRR001665_1.fastq.gz -r ../SRR001665_2.fastq.gz -k 13 4600000 0.04 -t 64 -od k13/ 2>&1 | tee k13/lighter.log

The correction runs through without problems, but the resulting fastq files have 25 respectively 42 unintentionally trimmed sequences in them like this one:
@SRR001665.72513 071112_SLXA-EAS1_s_4:1:6:808:233 length=36 cor bad_prefix=7 ak
GCGTGCCGAAGTTAGTGGGCCTGGAGAATC
+
IIIIIIIIIIIIIIIIII3?I/_%.IIII_IIC4I'
There are still all 36 quality scores, but the last in this case 6 bases of the sequence have been trimmed.

The output is:
[2016-03-22 16:44:22] =============Start====================
[2016-03-22 16:44:24] Bad quality threshold is "&"
[2016-03-22 16:45:33] Finish sampling kmers
[2016-03-22 16:45:33] Bloom filter A's false positive rate: 0.001899
[2016-03-22 16:47:13] Finish storing trusted kmers
[2016-03-22 16:52:13] Finish error correction
Processed 20816448 reads:
18328409 are error-free
Corrected 3617298 bases(1.453875 corrections for reads with errors)
Trimmed 0 reads with average trimmed bases 0.000000
Discard 0 reads

Answer 1 · 2016-03-22T18:02:56.000Z

I'll download that data set and take a look at it.

Thanks for letting me know.

Answer 2 · 2016-03-22T23:12:32.000Z

I think I've fixed the bug. Can you pull the new version and give it a try?

Thanks.

Answer 3 · 2016-03-23T10:09:59.000Z

Works fine for me now.
Awesome how fast you fixed it.
Thanks