Is there any way to extract a sequence between two other sequences and allow for mismatches
loukesio opened this issue · 1 comments
First of all thank you for maintaining and developing Biostrings and you make our lives better everyday.
The following might not be an issue, but mostly its a request/ I am asking for help. Please accept my apologies in advance for taking the advantage.
I have a string that looks like this
dna_string <- DNAString("AAAAANNNNNNNNNNNNNNNNNNNNNNNNNCCCCC")
I want to find a way to extract the sequence between left-pattern=AAAAA and right pattern=CCCCC and allow for mismatches on left and right.
I would like to set the minimum distance between left and right pattern to six and the maximum to 30.
Do you have any idea if this is possible?
I am aware of the super cool
matchLRPatterns("AAAAA", "CCCCC", 25, dna_string)
but in here I can only set the maximum distance and not the minimum?
This issue is very old! I completely missed it, sorry.
How about filtering the matches returned by matchLRPatterns()
to keep only those that have a minimum width? Something like this:
> library(Biostrings)
> dna_string <- DNAString("GTGTGTAAAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCCAGAG")
> matches <- matchLRPatterns("AAAA", "CCCCC", 35, dna_string)
> matches
Views on a 47-letter DNAString subject
subject: GTGTGTAAAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCCAGAG
views:
start end width
[1] 7 43 37 [AAAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCC]
[2] 8 43 36 [AAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCC]
[3] 28 43 16 [AAAANNNNNNNCCCCC]
> matches[width(matches) >= 20]
Views on a 47-letter DNAString subject
subject: GTGTGTAAAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCCAGAG
views:
start end width
[1] 7 43 37 [AAAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCC]
[2] 8 43 36 [AAAANNNNNGTGTNNNNNNNAAAANNNNNNNCCCCC]
This is really a question about basic usage of the package. Note that those questions are better asked on the Bioconductor support site here: https://support.bioconductor.org , where they get a lot more exposure and are more likely to get quick attention.
Best,
H.