terasum/js-mdict

fuzzy_search return empty array with some keywords

Closed this issue · 16 comments

Thanks a lot for your project.
I found that the method fuzzy_search many times doesnt return a correct result as other mdict apps or webs do. For some words it works well but fot others It doesnt.

For example, in the dictionary mdx there are many key text starts with 'on', but the fuzzy_search method returns a empty array and I did pass fuzzy_size, ed_gap params.

I was trying to figure out the problem but the MdictBase file is too heard for me.

@danjame 你好,非常感谢你的反馈,请问是否能够提供词典文件链接以及你的预期查词以及实际的查询结果呢?

Hi, thank you for your kindly feedback, can you provide the dictionary file's download link and the words which you search and the results you expected?

网盘,iztp

  1. For 红葡.mdx:
  • input "ir"
  • Expected: [ir, ira, iracarura, iracundo, iracundia, irade, irado, iraniano, iraquiano, ...]
    (the same sequence as text file converted from mdx file, all keys words start with input value as well as the others mdict app return)
    (返回的每个key都包含输入的关键字,而且按照key在词库里的顺序来返回,和其他mdict app 一样)
  • Got: [iperite, iperlta, ipu, ipueira, ipé, ipê, ir, ira, irade, irado ... ]
  1. For 袖珍.mdx:
  • input " 我们"
  • Expected: [我们, 我们的,我的,...]
  • Got: [我们, 我们的, 我的, 我们, 我们的, 我的... ]
    (Same keyword appears twice , tree times or more, depending the fuzzy_size passed)
    (同一个关键字返回多次,重复返回,已确定词库内没有重复关键字)
  1. For 红葡.mdx:
  • input "on"
  • Expected: [onanismo, oncologia, onda, onde, ondear, ondular, ondulacao, onerar, ...]
  • Got: [númen, númeno, núncia, núncio , o, oanaçu, oasiano, oba, ... ]
    (no keyword starts with "on")
    (没有包含关键字输入的值)

@danjame 你好,js-mdict 的模糊搜索算法采用的是 Levenshtein_distance 算法,并不是前缀搜索算法,只要符合ed_distance 步骤之内可以完成变换的词汇都是合法的,可以搜索到更多相似词汇。由于你搜索的关键词太短了 iron, 所以符合要求的词会很多,可以尝试增加一下搜索的词的长度。

@danjame 如果你想要前缀搜索的话,可以试试 prefix 函数,应该可以满足你的要求

prefix 函数也用了,比如输入 "on",只返回 "o"

@terasum prefix 函数:我们 => [我,我们],不知是不是我的使用有误?

@danjame 问题应该是在 prefix 函数的定义,js-mdict 的 prefix 是当前需要查询的词的前缀,比如当前想查询的词是 abc 那么前缀是 a ab abc ,但我理解你的需求是希望当前的查询词作为结果词的前缀,比如结果词为 abcd

是的,看来和我的需求相反了

@danjame 你说的这个功能我过两天抽空实现一下

@terasum 我 pull request 了一个方法,你看看这样写行不行

@danjame 你好,代码我已经合并了,但是我现在手头没环境测试,可以帮我测试一下你刚刚合并的代码吗?

@terasum 好的可以

@terasum 我把环境配好了打包成 lib 放在原项目测试了一下,目前没有发现问题。

@danjame 非常感谢你的支持,稍后我将发布到npm仓库