/korhal

KOrean Rpc-based Handy Application for Language processing

Primary LanguagePython

Korhal

GitHub tag (latest SemVer) PyPI Travis (.com) branch Codacy branch grade

Korhal(KOrean Rpc-based Handy Application for Language-processing) is a python wrapper for several korean Part-Of-Speech taggers.

How to install

pip install korhal

Available taggers

  • KOMORAN with korhal.komoran
  • Hannanum with korhal.hannanum
  • Open-source Korean Text Processor with korhal.openkoreantext

How to use

from korhal.komoran import tokenize

result = tokenize("집에 가서 잠을 자고 싶다")
# result => Token(text=집,pos=NNG), Token(text=에,pos=JKB), Token(text=가,pos=VV), Token(text=아서,pos=EC), Token(text=잠,pos=NNG), Token(text=을,pos=JKO), Token(text=자,pos=VV), Token(text=고,pos=EC), Token(text=싶,pos=VX), Token(text=다,pos=EC)]
print(result.text) # => 집
print(result.pos) # => NNG

nouns = [token.text for token in result if token.pos.startswith('N')]

Asynchronous methods

With korhal.aio, you can use asynchronous methods. The performance of multi-core systems can be slightly improved when performing extensive processing.

from korhal.aio.opentextkorean import tokenize
 
texts = ['달디단 맛있는 케이크가 있었다', '솜사탕 같이 귀여운 구름']
futures = [tokenize(text) for text in texts]
results = [f.result() for f in futures]

Thanks to