/kobo_jp_dict

A Japanese-English dictionary builder for Kobo e-readers.

Primary LanguageRustApache License 2.0Apache-2.0

Kobo Japanese Dictionary Builder

A tool that generates Japanese-English dictionaries for the Kobo line of e-readers from Yomichan dictionaries.

Example usage

Typical usage looks like this:

kobo_jp_dict -y jmdict_english.zip dicthtml-ja-en.zip

This takes the Yomichan dictionary jmdict_english.zip as input and produces the Kobo dictionary file dicthtml-ja-en.zip.

You can include as many Yomichan dictionaries as you like with repeated use of the -y flag like so:

kobo_jp_dict -y yomichan_dictionary_1.zip -y yomichan_dictionary_2.zip dicthtml-ja-en.zip

Not all Yomichan dictionaries are supported, but at least JMDict, kanji, name, and most Japanese-Japanese dictionaries should work reasonably well.

Installing the produced dictionary

On recent Kobo firmware the installation process is very straightforward: just copy the produced dictionary file to .kobo/custom-dict/dicthtml-ja-en.zip on your Kobo device.

Note that the filename is important: Kobo e-readers use the dictionary filename to determine the type of dictionary and what language(s) it's for. Your Kobo may fail to register the dictionary if you name it incorrectly.

If you've generated a Japanese-Japanese dictionary, you can use the filename dicthtml-ja.zip instead.

Using the dictionary

After installation, you can use it just like any other dictionary on the Kobo. It will show up as 日本語 - English (Custom) in your Kobo's dictionary drop-down list.

The dictionary entries look roughly like this (as best I can approximate with markdown):

たべる [2]   — 【食べる/喰べる】 verb, ichidan, transitive

  1. to eat
  2. to live on (e.g. a salary); to live off; to subsist on

The entry header (at the top) consists of four parts in this order:

  1. Pronunciation in hiragana.
  2. Pitch accent, enclosed in square brackets. This will be absent if you didn't provide a pitch-accent file or if the word wasn't in the pitch accent file.
  3. Written forms, enclosed in fancy square brackets. Generally the more common forms are listed first.
  4. Grammatical information, in a comma separated list. This is always present for verbs and i-adjectives, but otherwise is (intentionally) typically absent. The rationale for this minimalism is that 1. this is a reading-oriented dictionary, and 2. most of the remaining grammatical information is fairly obvious from context or from the translations/definitions.

After the entry header is a numbered list of translations/definitions, generally with more common usages closer to the top.

Requirements

To build, you just need a standard installation of Rust. You can then build this project with the typical cargo build --release command.

To run, you also need:

  • A good bit of free RAM (around 2GB). It deals with a lot of data, and I put zero effort into making it memory efficient because I don't expect it to be run frequently.
  • The marisa-build executable from the Marisa Trie project installed and in your path.

License

This project is licensed under either of

at your option.

Contributing

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you will be licensed as above, without any additional terms or conditions.