a-first-two-char-input-method

京都テキスト解析ツールキット（KyTea）を利用する．
入力を2文字区切りのヒント文として扱い（実際には1文字としても扱うことが出来る），もっとも可能性の高い単語の組み合わせとして文の出力を行うシステムを構築する．
ただし，英文のみを対象とする．

例えば，学習済みの sample.dat を用いて，以下の用に解が得られる．

$ python3 run.py -model sample.dat
Thisape.
Th/there is/is a/a pe/people ./.

（sample.dat のダウンロードはこちら）

利用の流れ

Create corpus with what kind of documents you need. (English only)
The documents' format must be following.

"""
This is a pen.
How are you today.
Children are playing outside.
...
"""

$ echo "This is a pen." > test.txt

$ python3 make_corpus.py test.txt

$ cat corpus/test.txt  
Th/This is/is a/a pe/pen ./.

Train KyTea model into model.dat.

$ train-kytea -full corpus/test.txt -model model.dat

Convert input string to appropriate sentence.

$ python3 run.py -model model.dat  
Thisape.
Th/This is/is a/a pe/pen ./.

$ pip3 install -r requirements.txt