Issues
- 0
Kuromoji_tokenizer: sort clause does not seem to work for some specific character combinations
#141 opened by ajaypvymo - 0
Kanji penalty and other penalty
#140 opened by elialm7 - 0
Handling of userDictionary comments
#139 opened by tottokug - 0
- 0
ソーシャルメディア is not tokenized into two words
#137 opened by hohno-panopto - 0
- 0
How to enable discardPunctuation in Kuromoji Java
#134 opened by yanghanxy - 0
how to increase heap size other than MAVEN_OPS
#133 opened by kazukousen - 2
Next release?
#131 opened by mpriala-code - 3
Configuring with Maven
#132 opened by Zurdge - 2
Optimization opportunity in the fst usage.
#130 opened by fulmicoton - 0
Kuromoji POS Train
#129 opened by abhinandansrivastava - 5
Normalized surface in user dictionary.
#126 opened by mrikitoku - 4
Unidic design flaw
#118 opened by wareya - 2
Compound word with nakaguro in it
#104 opened by mhko - 1
Nexus Repository is Offline?
#123 opened by ryantenney - 3
- 1
- 1
Obtain furigana?
#121 opened by 0x6C38 - 4
How to use Kuromoji in Gradle?
#120 opened by weituotian - 9
Kuromoji on Android
#96 opened by jendib - 1
- 1
Internals documentation and academic papers?
#117 opened by DarrenCook - 0
Possible Issue with tokenization when English+Japanese are adjacent in text
#116 opened by bbguitar77 - 0
Longer string in Katakana has low priority
#115 opened by oharato - 0
The tokenizing performance of mixed language
#113 opened by kwkwvenusgod - 5
Question: how to obtain multiple parsings?
#99 opened by fasiha - 0
http://www.atilika.org showcases the outdated maven artifact repository information
#111 opened by titsuki - 0
java.lang.RuntimeException: Could not load dictionaries. Caused by: java.io.IOException: Classpath resource not found: fst.bin
#110 opened by proninalex - 0
Is there any example for Lemmatization?
#107 opened by RangerWolf - 5
Tokenizing text in Hiragana character set
#105 opened by mhko - 0
Unidic Tokenization on Romaji Words
#103 opened by tobias-khs - 2
Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.4:test (default-test) on project kuromoji-benchmark: There are test failures.
#81 opened by kallewoof - 14
- 2
Very odd tokenization of a sentence
#82 opened by kallewoof - 7
Tokenizer is not serializable for Apache Spark
#85 opened by lamrongol - 3
Integration with solr
#87 opened by nisha-kajale - 2