A Clojure library for Japanese morphological analyzer MeCab.
Install Mecab
- Try below example to check whether mecab has been correctly installed
$ echo すもももももももものうち | mecab
すもも 名詞,一般,*,*,*,*,すもも,スモモ,スモモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
も 助詞,係助詞,*,*,*,*,も,モ,モ
もも 名詞,一般,*,*,*,*,もも,モモ,モモ
の 助詞,連体化,*,*,*,*,の,ノ,ノ
うち 名詞,非自立,副詞可能,*,*,*,うち,ウチ,ウチ
EOS
[jah524/clojure-mecab "0.3.0"]
(use 'clojure-mecab)
(parse "すもももももももものうち")
;=> [["すもも" "名詞" "一般" "*" "*" "*" "*" "すもも" "スモモ" "スモモ"]
; ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
; ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
; ["も" "助詞" "係助詞" "*" "*" "*" "*" "も" "モ" "モ"]
; ["もも" "名詞" "一般" "*" "*" "*" "*" "もも" "モモ" "モモ"]
; ["の" "助詞" "連体化" "*" "*" "*" "*" "の" "ノ" "ノ"]
; ["うち" "名詞" "非自立" "副詞可能" "*" "*" "*" "うち" "ウチ" "ウチ"]]
(extract-words "すもももももももものうち")
;=> ["すもも" "も" "もも" "も" "もも" "の" "うち"]
(extract-words "すもももももももものうち" ["名詞"] [])
;=> ["すもも" "もも" "もも" "うち"]
(extract-words "すもももももももものうち" [] ["名詞"])
;=> ["も" "も" "の"]
(extract-words "すもももももももものうち" ["名詞"] ["非自立"])
;=> ["すもも" "もも" "もも"]
This library uses clojure.java.shell/sh
to access mecab so that you do not need Java JNI bindings.
If you want to deploy your application to Saas such as Heroku, you better use kuromoji (Java implementation) instead.
But mecab is much faster than kuromoji, so you should use mecab when you process massive data.
Copyright © 2018 Jah524
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.