stringdist/qgram behaviour when q<nchar(x)
markvanderloo opened this issue · 0 comments
markvanderloo commented
I understand that the q-gram distance is the sum of absolute differences between q-gram vectors of both strings. But I see some weird behavior when one of the strings is shorter than the chosen q.
So for these two strings, while the qgrams function is correct:
> qgrams("a", "the cat sat on the mat", q = 2)
th he t sa on n ma e c ca at s t o m
V1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V2 2 2 2 1 1 1 1 2 1 1 3 1 1 1 1
The stringdist function returns:
> stringdist("a", "the cat sat on the mat", q = 2, method = "qgram")
[1] Inf
Instead of returning:
> sum(qgrams("a", "the cat sat on the mat", q = 2)[2,])
[1] 21
Posted at SO by Giora Simchoni.