marcelm/cutadapt

overlapping prefix adapters

marcelm opened this issue · 3 comments

From marcel.m...@tu-dortmund.de on May 08, 2012 17:27:10

The new -g ^ADAPTER option isn’t enough. There has been a request to allow less strict anchoring, where the adapter overlaps the beginning of the read.

This is easily achieved by this change:
-PREFIX = align.STOP_WITHIN_SEQ2
+PREFIX = align.STOP_WITHIN_SEQ2 | align.START_WITHIN_SEQ1

The question is whether that is the desired behaviour or whether both versions should be possible.

Original issue: http://code.google.com/p/cutadapt/issues/detail?id=43

From seb.th...@gmail.com on November 07, 2014 08:01:56

Yes this behaviour could very usefull. I have tons of examples were my MID has been "eaten" at the beginning of the read. Combining -e and -O allows substitutions and that's too permissive.

And by the way, thanks for the trick.

From marcel.m...@tu-dortmund.de on November 10, 2014 05:35:23

Thanks for the feedback! I’ll try to come up with a way of allowing this. It may take a while, howere (although I’m aware that this report is already two years old).

This can be achieved with -g XXXXADAPTER. The X characters are interpreted as IUPAC wildcards that do not match any nucleotide. If the number of X is lower than the number of allowed errors, the match will never be found within the read.