Build

Scripts and stylesheets for converting the XML dump files to various XML-based dictionary formats.

Usage

The fastest way to get started is to run the conversions with Docker.

Create the following directories:

  • $HOME/hunnor
  • $HOME/hunnor/build
  • $HOME/hunnor/data
  • $HOME/hunnor/deploy
  • $HOME/hunnor/tools

Download XML dump files from Dropbox:

  • Go to https://dict.hunnor.net/about
  • Click on the link to the Dropbox folder with the database exports (look for Adatbázisok or Dropbox)
  • Download and extract HunNor-XML-HN.xml.gz and HunNor-XML-NH.xml.gz
  • Move the extracted XML files to the data directory

Some conversions require tools that are not included in the Docker image:

Target Tool Installation
export.android.generate export-lucene-indexer-1.0.0.jar Package the export-lucene repo with Maven
export.kindle.compile.nb kindlegen Downloaded automatically by ant
export.xdxf.pocketbook.compile.hu converter.exe Downloaded automatically by ant
export.xdxf.pocketbook.compile.nb converter.exe Downloaded automatically by ant

Start the Docker container:

docker run\ --name export-ant\ --volume $HOME/hunnor/data:/data\ --volume $HOME/hunnor/deploy:/deploy\ --volume $HOME/hunnor/tools:/tools\ hunnordict/export-ant\ -Dbuild.dir=/data\ -lib /tools/export-lucene-indexer-1.0.0.jar\ export

You can replace 'export' with the target you want to run.

Depending on the target, the generated files will be either in the build or deploy directory.

Formats

Check the formats directory for the available formats.

Each format can be generated by applying XSLT transformations to the XML dump files.

While some formats use the XML dump files directly, others use an intermediate, pre-processed XML file that can be generated by applying the simple-html stylesheet to the XML dump files. Check the stylesheets and the XSpec tests to determine the proper input format.

Apple Dictionary

The format of Apple's Dictionary app for Mac. iOS uses the same dictionary file format, but there is no officially supported way to install custom dictionaries on iOS.

The compiler is part of Apple's Dictionary Development Kit, which is available in the Additional Tools for Xcode package from the Apple SDK download page. The dictionary can only be compiled on a Mac.

Source files: HunNor-Apple-[HN|NH].xml, HunNor-Apple-[HN|NH]-PList.xml

Babylon

The proprietary, legacy format of the Babylon dictionary.

Source files: HunNor-BB-[HN|NH].gls.gz

Kindle

Dictionary for Kindle e-book readers. Only the Norwegian-Hungarian direction is generated. Because of limited language support, the dictionary is set to Portuguese.

Lucene

A Lucene index directory with both directions, used by the native Android app. Spell checking index is included in a separate directory. Uses Lucene 3.6.2, to be compatible with Android.

PDF

Separate PDF files for each direction, generated with Apache FOP.

Realm

A Realm database with both directions. Realm supports several cross platform frameworks as well as native Android and iOS apps.

SDictionary

The custom format of the SDictionary Project. The textual source format compiles to an open source binary format.

Source files: HunNor-SD-[HN|NH].sdct.gz
Compiled files: HunNor-SD-[HN|NH].dct

SQLite

An SQLite database with both directions, used by the cross platform mobile apps.

Normalization

The _ascii columns in the roots and inflections tables contain written forms using ASCII characters only. These values are created with the following process:

  1. Replace æ with ae
  2. Replace ø with o
  3. Apply XSLT function normalize-unicode with normalization form NFKD
  4. Remove characters from Unicode category Combining Diacritical Marks

The XSpec tests contain test cases for common Hungarian and Norwegian characters.

Applications can apply the same transformation to user input. Sample code for some languages, using the String variable term:

Java

if (term != null) {
  term = Normalizer.normalize(term, Normalizer.Form.NFKD)
    .replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
    .replaceAll("æ", "ae").replaceAll("ø", "o")
    .replaceAll("Æ", "AE").replaceAll("Ø", "O");
}

JavaScript

if (term) {
  term = term
    .replace(/æ/g, "ae").replace(/ø/g, "o")
    .replace(/Æ/g, "AE").replace(/Ø/g, "O")
    .normalize("NFKD")
    .replace(/[\u0300-\u036f]/g, "");
}

StarDict

Source files and compiled dictionaries in StarDict format. The files marked with NoSym-Number are recommended for Windows phones.

Source files: HunNor-ST-[HN|NH].xml.gz, HunNor-ST-[HN|NH]-NoSym-Number.xml.gz
Compiled files: HunNor-ST-[HN|NH].zip, HunNor-ST-[HN|NH]-NoSym-Number.zip

XDXF

Source files in the XML Dictionary Exchange Format. The conversion uses a legacy version of the format, to be compatible with the PocketBook dictionary compiler. The DTD and support files for compilation are in the pocketbook directory.

Source files: HunNor-XDXF-L-[HN|NH].xdxf.gz