Some Chinese words are not included in the module

Question

Some Chinese words are not included in the module

LLauryn opened this issue 5 years ago · 3 comments

The library of Chinese grapheme-to-phoneme conversion is not complete. I have found part of missed Chinese words: 邓,吴,鄂,皖,蔡,萨,廖,宋,秦,刘,滧,闫,陕,郑,郝,犇,鹏,陇,祾,渭,邹,濮,梵,佟,韩,龚,洛,湘,婍,沂,隋,洣,潘,蒋,禹,喲,闽,湳,綪,睍,孻,汶,杭,吶,黔,渝,辽,銶,滇,灞,溁,浙,渤,邵,赣,淮,郸,彭,傣,蜀,沪,癍,郦,滕,滦,榣,姈,亳,漳,邢,涪,尧,昝,羲,媃,粤,鞑
from g2pc import G2pC
g2p = G2pC()
print(g2p("吴"))
e.g. When I input the text "***", the result for "邓" is ('邓', 'nr', '邓', '邓', '', '邓').
When I input "吴", the result is ('吴', 'nr', '吴', '吴', '', '吴'), etc.
All of words I post have the same problem like the examples above.

Answer 1 · 2019-07-09T01:16:02.000Z

Thanks. Most of them are used for names. I fixed the bug so update the library to check the new results. Some of them are still missing because they are not in cedict. Let me find a solution to this in the near future.

Answer 2 · 2019-10-12T09:42:27.000Z

@Kyubyong
Thanks for your impressive work. I also found Some Chinese words are not included in the module, such as "琊". Cound you update and include these missing Chinese words?

thanks

Answer 3 · 2019-10-12T09:47:35.000Z

Another question, how many Chinese word is included in the model? Cound you include the full Chinese Dictonary? Thanks