levenR
provides a few functions for simple Levenshtein alignment and distance calculation with multiple threads, ends-free and reduced homopolymer gap costs.
To install directly from github, use the devtools
library and run:
devtools::install_github("sherrillmix/levenR")
An example of calculating the Levenshtein distance between several strings to make a distance matrix:
library(levenR)
seqs <- c("AAATA", "AATA", "AAAT", "ACCTA")
leven(seqs)
## [,1] [,2] [,3] [,4]
## [1,] 0 1 1 2
## [2,] 1 0 2 2
## [3,] 1 2 0 3
## [4,] 2 2 3 0
An example of calculating the Levenshtein distance between several strings against a longer reference sequence:
library(levenR)
seqs <- c("AAATA", "AATA", "AAAT", "ACCTA")
ref <- "CCAAATACCGACC"
leven(seqs, ref, substring2 = TRUE)
## [,1]
## [1,] 0
## [2,] 0
## [3,] 0
## [4,] 1
An example of calculating the Levenshtein distance between several strings against two longer reference sequences and determining the best match for each read:
library(levenR)
seqs <- c("AAATA", "AATA", "AAAT", "ACCTA")
refs <- c("CCATAATACCGACC", "GGAAATACCTA")
dist <- leven(seqs, refs, substring2 = TRUE)
apply(dist, 1, which.min)
## [1] 2 1 2 2
An example of calculating the Levenshtein distance between several strings to make a distance matrix while ignoring indels in long homopolymers (an error type common in 454 and IonTorrent sequencing):
library(levenR)
seqs <- c("AAAAATA", "AAATTTTTA", "AAAAATTTA")
leven(seqs, homoLimit = 3)
## [,1] [,2] [,3]
## [1,] 0 2 2
## [2,] 2 0 0
## [3,] 2 0 0
An example of calculating the Levenshtein distance between several strings using multiple threads:
library(levenR)
seqs <- replicate(50, paste(sample(letters, 100, TRUE), collapse = ""))
system.time(leven(seqs))
## user system elapsed
## 0.685 0.138 0.819
system.time(leven(seqs, nThreads = 4))
## user system elapsed
## 0.196 0.001 0.056
An example of aligning strings against a longer reference:
library(levenR)
seqs <- c("AAATA", "AATA", "AAAT", "ACCTA")
ref <- "CCAAATACCGACC"
levenAlign(seqs, ref, substring2 = TRUE)
## $ref
## [1] "CCAAATACCGACC"
##
## $align
## [1] "--AAATA------" "---AATA------" "--AAAT-------" "------ACCTA--"