A python I/F for kuromoji (powered by Py4J).
$ pip install git+https://github.com/kirk3110/kuromojipy
or
$ cd yourworkspace
$ git clone https://github.com/kirk3110/kuromojipy.git
$ cd kuromojipy
$ python setup.py install
- Java 6.0+
- Python 2.7+/3.4+
- Py4J
When you execute the following code...
from kuromojipy.kuromoji_server import KuromojiServer
with KuromojiServer() as kuro_server:
kuromoji = kuro_server.kuromoji
tokenizer = kuromoji.Tokenizer.builder().build()
tokens = tokenizer.tokenize(u'お寿司が食べたい。')
for token in tokens:
print(token.getSurfaceForm() + '\t' + token.getAllFeatures())
you will get the following output.
お 接頭詞,名詞接続,*,*,*,*,お,オ,オ 寿司 名詞,一般,*,*,*,*,寿司,スシ,スシ が 助詞,格助詞,一般,*,*,*,が,ガ,ガ 食べ 動詞,自立,*,*,一段,連用形,食べる,タベ,タベ たい 助動詞,*,*,*,特殊・タイ,基本形,たい,タイ,タイ 。 記号,句点,*,*,*,*,。,。,。
Thanks!