longestCommonSeq implementation has exponential complexity
anschlechter opened this issue · 4 comments
longestCommonSeq is not efficient, it takes a very long time if the strings to be compared have a large distance.
The implemented algorithm has exponential complexity O(2^(n+m)).
Expected Result
predictable performance. algorithm should have O(n*m) complexity as is possible.
c.f. https://www.techiedelight.com/longest-common-subsequence/
--> translated to SCALA:
def LCSLength(X: String, Y: String): Int = {
val m = X.length
val n = Y.length
// lookup table stores solution to already computed subproblems,
// i.e., T[i][j]
stores the length of LCS of substring
// X[0…i-1]
and Y[0…j-1]
val T = Array.ofDim[Int](m + 1, n + 1)
// fill the lookup table in a bottom-up manner
for (i <- 1 to m) {
for (j <- 1 to n) {
if (X.charAt(i - 1) == Y.charAt(j - 1)) {
// if the current character of X
and Y
matches
T(i)(j) = T(i - 1)(j - 1) + 1
}
else {
// otherwise, if the current character of X
and Y
don't match
T(i)(j) = Integer.max(T(i - 1)(j), T(i)(j - 1))
}
}
}
// LCS will be the last entry in the lookup table
T(m)(n)
}
Actual Result
"unpredictable" runtime
Reproduction Steps
Compare "RUE NELSON MANDELA" with "RUE ALBERT UNGEHEUER" -> it just takes way to long to find out that these are not the same.
@anschlechter thanks for reporting the issue.
the implementation used in the library for longestCommonSeq is here: (fixed)
https://github.com/vickumar1981/stringdistance/blob/master/src/main/scala/com/github/vickumar1981/stringdistance/impl/LongestCommonSeqImpl.scala#L4
Will add that test case and compare implementations. 👍
@anschlechter published a 1.2.5-SNAPSHOT
with your changes included here: #65
Sonyatype snapshot repo: https://oss.sonatype.org/content/repositories/snapshots/com/github/vickumar1981/
Testing on my end, it looks to be much faster :). Thanks for the great test cases and the implementation fix!
If you could/would possibly pull down or use the 1.2.5-SNAPSHOT
and confirm that this fixes the issue, then I can merge these changes and include them in the 1.2.5
release on maven.
Thanks again! cheers.
Hi, I tried the 1.2.5-SNAPSHOT and it looks good to me. Thank you.
Addressed by: #66
Fixed in version 1.2.5
. Published v1.2.5
artifact to sonatype and should be available on maven central shortly, once it syncs.