Scripts and stylesheets for converting the XML dump files to various XML-based dictionary formats.
The fastest way to get started is to run the conversions with Docker.
Create the following directories:
$HOME/hunnor
$HOME/hunnor/build
$HOME/hunnor/data
$HOME/hunnor/deploy
$HOME/hunnor/tools
Download XML dump files from Dropbox:
- Go to https://dict.hunnor.net/about
- Click on the link to the Dropbox folder with the database exports (look for Adatbázisok or Dropbox)
- Download and extract
HunNor-XML-HN.xml.gz
andHunNor-XML-NH.xml.gz
- Move the extracted XML files to the
data
directory
Some conversions require tools that are not included in the Docker image:
Target | Tool | Installation |
---|---|---|
export.android.generate | export-lucene-indexer-1.0.0.jar | Package the export-lucene repo with Maven |
export.kindle.compile.nb | kindlegen | Downloaded automatically by ant |
export.xdxf.pocketbook.compile.hu | converter.exe | Downloaded automatically by ant |
export.xdxf.pocketbook.compile.nb | converter.exe | Downloaded automatically by ant |
Start the Docker container:
docker run\ --name export-ant\ --volume $HOME/hunnor/data:/data\ --volume $HOME/hunnor/deploy:/deploy\ --volume $HOME/hunnor/tools:/tools\ hunnordict/export-ant\ -Dbuild.dir=/data\ -lib /tools/export-lucene-indexer-1.0.0.jar\ export
You can replace 'export' with the target you want to run.
Depending on the target, the generated files will be either in the build
or deploy
directory.
Check the formats
directory for the available formats.
Each format can be generated by applying XSLT transformations to the XML dump files.
While some formats use the XML dump files directly, others use an intermediate, pre-processed XML file that can be generated by applying the simple-html
stylesheet to the XML dump files. Check the stylesheets and the XSpec tests to determine the proper input format.
The format of Apple's Dictionary app for Mac. iOS uses the same dictionary file format, but there is no officially supported way to install custom dictionaries on iOS.
The compiler is part of Apple's Dictionary Development Kit, which is available in the Additional Tools for Xcode package from the Apple SDK download page. The dictionary can only be compiled on a Mac.
Source files: HunNor-Apple-[HN|NH].xml
, HunNor-Apple-[HN|NH]-PList.xml
The proprietary, legacy format of the Babylon dictionary.
Source files: HunNor-BB-[HN|NH].gls.gz
Dictionary for Kindle e-book readers. Only the Norwegian-Hungarian direction is generated. Because of limited language support, the dictionary is set to Portuguese.
A Lucene index directory with both directions, used by the native Android app. Spell checking index is included in a separate directory. Uses Lucene 3.6.2, to be compatible with Android.
Separate PDF files for each direction, generated with Apache FOP.
A Realm database with both directions. Realm supports several cross platform frameworks as well as native Android and iOS apps.
The custom format of the SDictionary Project. The textual source format compiles to an open source binary format.
Source files: HunNor-SD-[HN|NH].sdct.gz
Compiled files: HunNor-SD-[HN|NH].dct
An SQLite database with both directions, used by the cross platform mobile apps.
The _ascii
columns in the roots
and inflections
tables contain written forms using ASCII characters only. These values are created with the following process:
- Replace æ with ae
- Replace ø with o
- Apply XSLT function
normalize-unicode
with normalization form NFKD - Remove characters from Unicode category Combining Diacritical Marks
The XSpec tests contain test cases for common Hungarian and Norwegian characters.
Applications can apply the same transformation to user input. Sample code for some languages, using the String
variable term
:
if (term != null) {
term = Normalizer.normalize(term, Normalizer.Form.NFKD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "")
.replaceAll("æ", "ae").replaceAll("ø", "o")
.replaceAll("Æ", "AE").replaceAll("Ø", "O");
}
if (term) {
term = term
.replace(/æ/g, "ae").replace(/ø/g, "o")
.replace(/Æ/g, "AE").replace(/Ø/g, "O")
.normalize("NFKD")
.replace(/[\u0300-\u036f]/g, "");
}
Source files and compiled dictionaries in StarDict format. The files marked with NoSym-Number are recommended for Windows phones.
Source files: HunNor-ST-[HN|NH].xml.gz
, HunNor-ST-[HN|NH]-NoSym-Number.xml.gz
Compiled files: HunNor-ST-[HN|NH].zip
, HunNor-ST-[HN|NH]-NoSym-Number.zip
Source files in the XML Dictionary Exchange Format. The conversion uses a legacy version of the format, to be compatible with the PocketBook dictionary compiler. The DTD and support files for compilation are in the pocketbook
directory.
Source files: HunNor-XDXF-L-[HN|NH].xdxf.gz