nxtrim does not remove Illumina adapters when --rf is used
Closed this issue · 6 comments
Hi,
the runtime help is a bit unclear what the --rf
option does. However, it appears nxtrim
v0.4.1-7db257e changes RF reads into FR by default. I found that it does not remove Illumina adapters if I used --rf
to override the default behavior. Probably because it searches for the adapters in "forward" representation and does not try the reverse-complementary form as well. Hence, it does not work if --rf
was used. Am I guessing correctly?
Further, I propose changing --rf
to --disable-RF-to-FR-conversion
.
Finally, would you mind documenting in the README file which assemblers/mappers cannot work with the RF reads and require FR as input?
Hello!
The --rf
tag should return reads in reverse-forward orientation without any other difference. If adapter removal is being affected by --rf
then it is a bug. If you can send me an example read-pair with the problem that would be much appreciated.
I'm not keen to change the name of the argument since people might already be relying on the current name.
I have documented how to use SPAdes/Velvet with the nxtrim, but I don't have experience with other assemblers. Perhaps a wiki entry with successful assemblies by users could be added. Similar to this: https://github.com/sequencing/NxTrim/wiki/Bacterial-assemblies-using-Nextera-Mate-pairs
The --rf tag should return reads in reverse-forward orientation without any other difference.
I had the impression they are "naturally" in RF orientation. Therefore, I thought this option is to flip them so that they appear as ordinary FR reads. Would you mind clarifying the description of the option? Currently it states:
--rf leave reads in RF orientation (or use this if your reads are already in FR orientation)
I have documented how to use SPAdes/Velvet with the nxtrim, but I don't have experience with other assemblers.
Well it seemed you know it is safer to flip the RF reads into FR reads in general. also the Illumina docs on Data Processing of Nextera Mate Pair Reads ... seemed to be on the same wave.
I will prepare some testcases.
I had the impression they are "naturally" in RF orientation.
This is correct.
Therefore, I thought this option is to flip them so that they appear as ordinary FR reads.
It is the opposite. Basically, if you want FR reads do not use --rf
. If you want RF reads, use --rf
.
The default behaviour of nxtrim (without --rf
) takes the "natural" RF reads and reverse complements them when appropriate such that the output is FR (the desired orientation for velvet/spades). Note there is a complication here in that the virtual PE reads are already FR so they do not need to be reverse-complemented.
The --rf
flag does the opposite of this default behaviour, returning reads as RF.
--rf leave reads in RF orientation (or use this if your reads are already in FR orientation)
Still, the documentation string is confusing. Does the "(or use this if your reads are already in FR orientation)" really mean it will flip input FR into into desired RF orientation. I don't believe. ;-)
At least I would propose:
--rf leave mate pair reads in RF orientation as they are [by default are flipped into FR]
Agreed. Changed the help correspondingly.
Please note the README describes the nxtrim process in detail:
The trimmer will reverse-complement the reads such that the resulting libraries will be in Forward-Reverse (FR) orientation, if you wish to keep your reads as Reverse-Forward then use --rf flag.
Here are reads containing TruSeq_Adapter_Index_6 with GCCAAT barcode as they were output from nxtrim (after RF to FR conversion). Please note that these reads were from Illumina NextSeq device, so the nxtrim changed trailing polyG into leading polyC. IMHO this will cause issues with downstream trimming of reads. At least it should be emphasized in the README file:
> NB501598:62:HFYJ5AFXX:1:11305:22286:4051 1:N:0:GCCAAT
Length=155
Score = 127 bits (64), Expect = 2e-28
Identities = 64/64 (100%), Gaps = 0/64 (0%)
Strand=Plus/Minus
Query 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 79 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 20
Query 61 CTTG 64
||||
Sbjct 19 CTTG 16
> NB501598:62:HFYJ5AFXX:1:11201:5173:2350 1:N:0:GCCAAT
Length=155
Score = 127 bits (64), Expect = 2e-28
Identities = 64/64 (100%), Gaps = 0/64 (0%)
Strand=Plus/Minus
Query 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 68 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 9
Query 61 CTTG 64
||||
Sbjct 8 CTTG 5
> NB501598:62:HFYJ5AFXX:1:21203:4547:6784 1:N:0:GCCAAT
Length=155
Score = 125 bits (63), Expect = 7e-28
Identities = 63/63 (100%), Gaps = 0/63 (0%)
Strand=Plus/Minus
Query 2 GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGC 61
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 155 GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTGC 96
Query 62 TTG 64
|||
Sbjct 95 TTG 93
> NB501598:62:HFYJ5AFXX:1:11108:9901:15649 1:N:0:GCCAAT
Length=155
Score = 119 bits (60), Expect = 4e-26
Identities = 63/64 (98%), Gaps = 0/64 (0%)
Strand=Plus/Minus
Query 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 60
|||||||||||||||||||||||||||||||||||||||||||||||||||||||| |||
Sbjct 155 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTGCTG 96
Query 61 CTTG 64
||||
Sbjct 95 CTTG 92
> NB501598:62:HFYJ5AFXX:1:11103:16230:15181 1:N:0:GCCAAT
Length=155
Score = 119 bits (60), Expect = 4e-26
Identities = 63/64 (98%), Gaps = 0/64 (0%)
Strand=Plus/Minus
Query 1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGCCGTCTTCTG 60
|||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||
Sbjct 155 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATATCGTATGCCGTCTTCTG 96
Query 61 CTTG 64
||||
Sbjct 95 CTTG 92
@NB501598:62:HFYJ5AFXX:1:11103:16230:15181 1:N:0:GCCAAT
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCACCCCCCCCCCCCCCCCCTATTTTTTTTTCAAGCAGAAGACGGCATACGATATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
+
A/AEEEA<A<<E</EE/A/<6AAA/A//<E///A/E/EEE/EE/E////6/</<///<<//A/<///E</////////////AEE6EEEEE/EAEE/EAEEEAEE6/EE//EEEEEEE//</EAE/EEEEEEEEEAE//E/EEEE///EEAAA/A
@NB501598:62:HFYJ5AFXX:1:11108:9901:15649 1:N:0:GCCAAT
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCATTTTTTTTTCAAGCAGCAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
+
/AEAEAEEEEEE<<EE<<AE<<<AEEEEEEEEEEAEEE</AE<A<EAAAEAEEEE/EEEEEEE/<//EAE/EAEEE/A/////EEE66E//A/////E/EEEE//EEEAA///EAE//E/E///EEEEEEE//EAEEAEEA6EE6E6/EEAAAA/
@NB501598:62:HFYJ5AFXX:1:11201:5173:2350 1:N:0:GCCAAT
TTTTCAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTATAGAAGTCCAAAAAGCTTGGCCATCACGTGATTCATGGAGACTTCAGTTAGCTCCTGGAAGCTCATAGTGAGCCATTGAAATACAT
+
EAAAEEA<<AEA<EEEEEE/EEEEAE</EEEEEEE/EEEAEEEE//EEA<EEEAEEEEEAEEAEEEEEE/EEEEEAEEEEEEAEEEE<EEEE6EEEEEEEE/EEEEAEEEAEEEEEEEEEE6EEEAEEEAEEEEEEE6EEEEEEEEEEEEAAAAA
@NB501598:62:HFYJ5AFXX:1:11305:22286:4051 1:N:0:GCCAAT
CCCGCTTTTTTTTTTCAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTTGGTCCATCCTTCTAACGTTTATCTGTTTATATTATTCGAGGGTTTATGTGGGTTGGTTTATTTGTTTATGATTAA
+
<///////EEEAEAEA//<</6E/A/6/EAEE/A////A/EE/E<</EA/<6//<AAAA///AEAEA/E/EA//6/E/EE6<E//AE//EEAA/////E////E///////6E/AE//////EE//EEEAEEAE//6//EA//A/EEE6EAA/AA
@NB501598:62:HFYJ5AFXX:1:21203:4547:6784 1:N:0:GCCAAT
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCATTTTTTTTTTCAAGCAGAAGACGGCATACGAGATATTGGCGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC
+
/<A<<A/AA/<A<//<EAA<</A6A/EA/EAEAA<EAEEEEEEEEAEAA</EAEEEEEEEEEEEAEEEEEEEEAEEA</E////EEEEEE/E/EEEEEEEEEEEEAEEEEE<E/EEEAEEEEEEEEEE<AEEEAEEE/EEEEEEEEEEEEAAA/A
Provided you Closed this bug already lets move the issue with unremoved Illumina adapters under a new issue. The above I included to emphasize what the blind RF to FR conversion causes.