Kyubyong/g2pC

Some Chinese words are not included in the module

LLauryn opened this issue · 3 comments

The library of Chinese grapheme-to-phoneme conversion is not complete. I have found part of missed Chinese words: 邓,吴,鄂,皖,蔡,萨,廖,宋,秦,刘,滧,闫,陕,郑,郝,犇,鹏,陇,祾,渭,邹,濮,梵,佟,韩,龚,洛,湘,婍,沂,隋,洣,潘,蒋,禹,喲,闽,湳,綪,睍,孻,汶,杭,吶,黔,渝,辽,銶,滇,灞,溁,浙,渤,邵,赣,淮,郸,彭,傣,蜀,沪,癍,郦,滕,滦,榣,姈,亳,漳,邢,涪,尧,昝,羲,媃,粤,鞑
from g2pc import G2pC
g2p = G2pC()
print(g2p("吴"))
e.g. When I input the text "***", the result for "邓" is ('邓', 'nr', '邓', '邓', '', '邓').
When I input "吴", the result is ('吴', 'nr', '吴', '吴', '', '吴'), etc.
All of words I post have the same problem like the examples above.

Thanks. Most of them are used for names. I fixed the bug so update the library to check the new results. Some of them are still missing because they are not in cedict. Let me find a solution to this in the near future.

@Kyubyong
Thanks for your impressive work. I also found Some Chinese words are not included in the module, such as "琊". Cound you update and include these missing Chinese words?

thanks

Another question, how many Chinese word is included in the model? Cound you include the full Chinese Dictonary? Thanks