/jpstats

tools for japanese corpus linguistics (as a hobbyist) in rust

Primary LanguageRust

Japanese Text Stats Tools

Command line program containing several different analysis tools for analyzing long Japanese stories and comparing them to one another. Counts, frequency analysis, complexity estimation, etc.

All necessary functionality is present but there's no good interface (even command-line) yet.

Existing Analyses

This repository already contains analysis results for visual novels (workspace/), stories from 小説家になろう (narou/), and a pile of plaintext novels I found lying around somewhere (novels/) (grouped by author, not series). This includes stats and merged frequency lists. The collections of individual frequency lists are quite large so I have no plans to share them through the repository, but if you ask me somewhere I can archive and upload them for you.

No quality guarantees of any kind are made regarding the analysis methods used here (I'm just a hobbyist) nor are any guarantees made about the cleanness of the corpus material on which the analyses were run.

License

Included program source code files are licensed under the Apache License, version 2.0. Copyright 2019. https://www.apache.org/licenses/LICENSE-2.0

Included data and configuration files (including analysis result data files) are released into the public domain and also licensed under the Creative Commons Zero license, any version.