SimpleTokenizer

Natural language processing research

Requirement: java 1.7

Tokenizer.jar is a runnable program which reads string from standard input

input: "Hello World. 中文斷詞"

output:
hello
world