infinilabs/analysis-pinyin

youxiangongsi 分词异常

Opened this issue · 2 comments

    {
        "token": "you",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 0
    },
    {
        "token": "xiang",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 1
    },
    {
        "token": "o",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 2
    },
    {
        "token": "n",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 3
    },
    {
        "token": "g",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 4
    },
    {
        "token": "si",
        "start_offset": 0,
        "end_offset": 0,
        "type": "word",
        "position": 5
    }

我也遇到这个问题 周星驰简拼 zxc 被分词成一对单个字符。

我也遇到了。假设拼音特意采用空格分隔,如:ying lun mi an,通过拼音分词能分成ying lun mi an,而不是现在的ying lun mian把mi和an黏在了一块