Since Kindle ebook readers unfortunately don't come with any Norwegian (Bokmål) dictionaries, here is a simple way for creating one based on dict.cc data. The resulting dictionary can be used like any other Kindle dictionary (in-document word look-up (also of inflected forms), vocabulary trainer, browsing the dictionary). It contains ca. 24.800 uninflected NB > DE entries plus (regularly and irregularly) inflected forms for most verbs, nouns and adjectives.
With slight changes, these files can be used to create bilingual dictionaries based on other dict.cc language pairs.
-
Get the dictionary source data from dict.cc's download page and save it as
data/dict.cc/dict.cc.tsv
. -
Get the files
lemma.txt
andfullformsliste.txt
from Språkbankens ressurskatalog and save them indata/spraakbanken/
. -
Get a list of Bokmål stop words (for instance via ranks.nl) and save it as
data/stopwords/stopwords.txt
(one word per line). -
Convert the TSV file into an appropriately formatted HTML file:
python transform.py > NB_DE_dict.html
- Install KindleGen and use it to convert the dictionary into a
MOBI
file. The conversion requires the following files:
NB_DE_dict.opf
: Contains information on the files used forMOBI
conversion and general metadata about the dictionary.NB_DE_dict.html
: Contains the actual dictionary entries.NB_DE_dict.jpeg
: The cover image (useless, but required for creating theMOBI
file).
kindlegen.exe NB_DE_dict.opf -c2 -verbose -dont_append_source
-
(Optional) Use the Kindle Previewer to preview the dictionary. Note that this only allows you to view the dictionary as if it were a regular book, but you unfortunately cannot try it out on an actual book in preview mode.
-
Copy the
MOBI
file to the directorydocuments/dictionaries/
on your Kindle. You may need to restart the device afterwards (especially if you are updating the dictionary).
If you are using Windows, you can execute steps 4 and 5 at once by executing run.bat
.
To uninstall, go to documents/dictionaries/
and delete NB_DE_dict.mobi
as well as NB_DE_dict.sdr/
.
-
In the
OPF
file, update the dictionary title, languages and all relevant file names. -
If the dictionary data is not in the dict.cc format, either re-format it accordingly or change the way the file is parsed in
transform.py
. -
Create a class that can generate inflected forms and that extends the
Inflector
class (inflector.py
). Use it asInflector
class intransform.py
. -
Follow the steps above for creating & installing a new dictionary.
- Generate inflections (nouns, adjectives, verbs).
- Regular inflections (from Språkbanken where available, otherwise generated according to regular inflection paradigms)
- Irregular inflections (from Språkbanken's list)
- Genitive forms
- Multi-token entries (in particular: phrasal verbs)
- Deal with parentheses and ellipses in Norwegian entries.
- Merge entries for identical Norwegian words (e.g.
blomsterbutikk
).- Extend this to
[kvinnelig]
entries.
- Extend this to
- Show relevant multi-token entries when looking up single-token entries (e.g. the entry for
blå
(blue) also contains information on the phraseå være i det blå
(to be in the dark), which is also a distinct entry).- I don't check for POS tags when creating these references; therefore, there are some false positives here. Since I find them quite interesting, I don't plan on refining this.
- Extend the dictionary.
- Note: Unless compound nouns are in the dictionary, it's not possible to look them (or their constituents) up. Since I cannot change the way the dictionary is used to look up entries, there is not much I can do.
- Look into adding Wiktionary data. Specifically from the English or Norwegian versions of Wiktionary.
- The best (monolingual) Norwegian dictionary I know is https://ordbok.uib.no/, whose database I unfortunately cannot download and use. But maybe there are other good monolingual dictionaries out there that I can use?
- Written Danish and Bokmål are very similar. If I can find a large DA>EN or DA>DE dictionary, it could be worth looking into adding these entries where no Norwegian entries are present.
- What about Norsk Ordvev (Norwegian WordNet) for (monolingual) thesaurus-like information?
- Dict.cc data. NB > DE translation data.
- Norsk Ordbank in Norwegian Bokmål 2005 (Språkbankens ressurskatalog). Lists of Norwegian lemmas and inflected forms.
- Norwegian stop words (Ranks NL).
- Amazon Kindle Publishing Guidelines.
This document describes how to create files that can be converted into
MOBI
files. There is also a section on creating dictionaries. - KindleGen is used for creating
MOBI
files.- Sample files: http://kindlegen.s3.amazonaws.com/samples.zip. They give an impression of what
OPF
/HTML
files should look like so they can be converted intoMOBI
files.
- Sample files: http://kindlegen.s3.amazonaws.com/samples.zip. They give an impression of what
- Kindle Previewer