Running cosine gives wrong distances when length of first pattern greater than or equal to length of x
jordivandooren opened this issue · 1 comments
jordivandooren commented
# first pattern has nchar < nchar of x: no problem
> stringdist::afind("ab", c("a", "b"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0
> stringdist::afind("ab", c("a", "c"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 1
> stringdist::afind("ab", c("a", "ab"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0
# first pattern has nchar >= nchar of x: unexpected results
> stringdist::afind("ab", c("ab", "a"), method = "running_cosine")$distance
[,1] [,2]
[1,] 0 0.06066017
> stringdist::afind("ab", c("xx", "a"), method = "running_cosine")$distance
[,1] [,2]
[1,] 1 1
# example where match is wrong (b should match b with distance 0)
> stringdist::afind("ab", c("xx", "b"), method = "running_cosine")
$location
[,1] [,2]
[1,] 1 1
$distance
[,1] [,2]
[1,] 1 1
$match
[,1] [,2]
[1,] "ab" "a"
The unexpected results occur on multiple versions/platforms (tested on Linux R 4.1, Windows R 4.2).
jordivandooren commented
Looks similar to #96