Find most similar word among target and list of words
Closed this issue · 2 comments
First of all, let me congratulate the dev for this amazing library.
I was wondering if it is implemented some kind of function that allow to find the most similar word among a target word and a vocabulary. In example;
Target word: tsring
Vocabulary: ['hello', 'world', 'string', 'foo', 'bar']
So maybe something like:
jw = JaroWinkler()
jw.most_similar('tsring', ['hello', 'world', 'string', 'foo', 'bar'])
[1] 'string'
I've tried the same construction for the distance
and similarity
methods but although no error is thrown it seems that the operation is not supported.
jw.distance('tsring', ['hello', 'world', 'string', 'foo', 'bar'])
[1] 1.0
jw.similarity('tsring', ['hello', 'world', 'string', 'foo', 'bar'])
[2] 0.0
I know it's trivial to implement an independent function with this behavior based on the distance
or similarity
functions. But just in case a highly-optimized function is already implemented :)
Thanks in advance!
In case someone is interested in doing it in a naïve way (as a workaround);
Function definition:
def most_similar(target, vocab, method):
sims = []
for word in vocab:
sims.append(method.similarity(target, word))
return vocab[np.argmax(sims)]
Usage:
target = 'tsring'
vocab = ['hello', 'world', 'string', 'foo', 'bar']
jw = JaroWinkler()
most_similar(target, vocab, jw)
[1] 'string'
You can always build high-level apis based on similarity
and distance
by yourself.