/BibleMultiConverter

Converter written in Java to convert between different Bible program formats

Primary LanguageJavaOtherNOASSERTION

BibleMultiConverter

Converter written in Java to convert between different Bible program formats

Copyright (c) 2007-2018 Michael Schierl Licensed unter MIT License; for details, see the LICENSE file.

Usage

If you clone from Git or download a source zip, you will need a Java JDK 8, and Apache Maven 3.5 or above, to build. Just run "mvn package" and you will find a suitable distribution .zip file in the TARGET folder. Note that to build all modules (including SWORD and All-In-One), you will need to have built JSword first and installed it to your local maven repository. Alternatively, skip these modules by adding -pl !biblemulticonverter-sword,!biblemulticonverter-allinone to your Maven command line.

If you download a precompiled .zip file, you will need a Java Runtime Environment 8 or above, available from java.com. Just run

java -jar BibleMultiConverter.jar

on the command line for usage information. Each module has its own help, which can be shown by using the "help" module.

Documentation

The documentation is currently a bit lacking. Try the commands, or look at the source, or open an issue if anything is unclear.

Supported Formats

BibleMultiConverter supports three custom formats, which are loss-less (support all features supported by the BibleMultiConverter framework) and are supported both for import and for export:

  • Compact: Designed for creating small text files that compress well
  • Diffable: Designed to make comparing different bibles easier
  • RoundtripXML: Useful for interchange of modules with converters written in other programming languages (that prefer XML binding to plaintext parsing)

In addition, there are other formats that can preserve all features supported by the BibleMultiConverter framework, and can therefore used for exchanging or editing modules without loss of data:

  • RoundtripHTML: HTML format that can be read back if desired (originally intended for publishing on free website hosters, but with the advent of free file hosters this feature is pretty much obsolete).
  • RoundtripODT: Export as an editable .odt (OpenOffice/LibreOffice Document Text), which can be edited in LibreOffice (tested with LibreOffice 6.0) and later imported again. Large bibles can take a minute or so to open in LibreOffice 6, which is quite some improvement since LibreOffice 5.x sometimes took more than 15 minutes. Note that all formatting is exported as named Paragraph or Text styles, and other individual formatting will be ignored when importing.

In addition, the following other formats are supported, with varying accuracy:

  • HaggaiXML: import and export
  • OSIS: import and export (only a very limited subset of OSIS standard)
  • ZefaniaXML: import and export (There are two import filters and three export filters available that focus on different subsets/features of this quite diverse format)
  • ZefDic (Zefania's dictionary format): import and export (two export filters)
  • TheWord: import and export
  • PalmBible+: import and export (for the XML-like format used to build PDBs)
  • UnboundBible: Import and export
  • MobiPocket: export only
  • Volksbibel2000: export only
  • OnLineBible: export only
  • BrowserBible: export only
  • Quick Bible: export only
  • SWORD modules: import only (see below for details)
  • MyBible.Zone (more bibles): import and export (in a special SQLite edition)
  • Bible Analyzer: export only (text export for bibles and dictionaries, SQLite export for bibles)
  • Accordance: export only
  • BibleWorks: import and export
  • Equipd Bible: export only
  • USFM/USX: import and export (most tags are supported; not supported are \ca \cp \va \vp \fig \fm)
  • USFX/: import only

In combination with third party tools, other export formats are available:

  • In combination with LibreOffice 4.4, it is possible to export bibles for Logos Bible Software (see below for details)
  • In combination with the E-Sword ToolTipTool NT v2.51, it is possible to export bibles for E-Sword (see below for details)

While the focus of this tool is for bible texts, there is also limited support for (Strong) dictionaries.

The StrongDictionary import filter downloads a public domain Strong dictionary and compiles it for exporting as HTML, MobiPocket, Logos or ZefDic (currently no other exporters support dictionaries).

The StrongConcordance import filter takes a Strong dictionary and a Bible and augments the dictionary with concordance information (i. e. links that link back to all verses that contain this word in that particular Bible).

Three utility exporters are also available: Validate validates the syntax of a bible file, and StrippedDiffable exports a Diffable, but removes certain features (like prologs, footnotes, headlines, etc.) In case you want to rename or remove certain books automatically, have a look at the Diffable importer, which provides options for that. HeatMapHTML can generate heat maps (or verse statistics) which show how often a certain feature (e. g. a footnote, a divine name, or the word "Jesus") appears in the Bible and where exactly.

The ValidateXML tool can be used to validate an input XML file against a XSD schema. The schema can be given as a file, as an URL or one of the embedded schema names OSIS, ZefaniaXML, HaggaiXML, RoundtripXML, USFX, USX or ZefDic. This is useful as in case of an invalid XML input file, the schema usually provides better error messages than what is provided by the import modules.

The SQLiteDump tool (part of the SQLite edition) can dump SQLite databases in a diffable text format; useful for diagnosing problems with Bible programs that use SQLite based formats or for importing MyBible.Zone bibles.

The ParatextConverter tool can be used to convert between USFM/USFX/USX formats without converting to BibleMultiConverter's internal format first, or to remove tagged OT/NT/Deuterocanonical content from such a file. It can also be used to convert to ParatextDump format (which is a diffable plain text dump of the internal Paratext structure and useful for comparing different Paratext formats).

Planned formats

EPUB export is planned (but not high priority at the moment).

If you want to see any other formats, feel free to open an issue (or a pull request :-D).

Limitations

When comparing the bible formats that are currently used (both free and commercial), they can be divided into two broad categories (or paradigms).

The first category uses books, prologs, chapters, verses and head-lines as primary structural elements. As a consequence, formattings/styles cannot span verse boundaries, and paragraph separators are always inside (or often at the end of) verses.

Zefania XML, Haggai XML, TheWord, PalmBible+, e-Sword, SWORD, most online bible websites, and also the commercial MfChi format follow this category. Therefore, the format internally used by this converter follows this category.

The other category, which covers popular formats like Logos Bible Software or USFM, treats formattings and paragraphs as primary structural elements. Within these formattings, so-called milestone markers are used to denote the beginning and end of chapters and verses. Prologs, appendixes, or headlines are a concept that does not exist (and neither is required) for this paradigm - text that is inside a chapter but before the first verse happens to be a chapter prolog, for example.

Non-bible-specific Export formats like MobiPocket, HTML or ePub can also handle these formats quite well, by skipping all the milestone markers.

OSIS is some kind of hybrid format, as the creator can decide whether formatting or structural elements are represented as nested tags; the other type is then represented as milestones.

As this converter internally uses the first category, conversions between different second-category formats (like from verse-milestoned OSIS to Logos) will always lose more information than needed; if you want to convert this way, perhaps another converter tool is more appropriate for your needs. As soon as either the input or the output format is of first category, this converter probably outperforms other converters easily.

In addition, some Bible formats have very sophisticated formatting features (which are not used by most of the available modules), like several paragraph styles or even list and tables. All these formats get reduced to the bare minimum: paragraph breaks as well as line breaks with and without indentation.

Reporting exporter bugs

In case you are trying to export a module, but the exporter throws an error message you do not understand, I'd prefer if you could share a Diffable version of the module with me. However, I understand that this is not always possible, e.g. due to copyright restrictions. In that case, you can try if the bug can still be reproduced after exporting export the module using the ScrambledDiffable export format; this format is designed to leave the structure of the document intact but scramble all the (Greek and Latin) letters and digits are scrambled beyond repair (or repairable with a password if you prefer).

Try using =23 as argument first, which should replace all letters by 'X', resulting in a well compressible file. In case that one does not reproduce the bug, use without arguments (random numbers). In case you want to share a Bible where others are able to compare if their verses are the same, use '#SHA-1' as argument; that way, the same verse will scramble to the same 'ciphertext', so the resulting files are still diffable although unreadable. In case you have to be able to reverse the scrambling (if the whole file is unchanged), you can use '+Password' for initial scrambling and '-Password' for later decrypting. To verify if two bible were scrambled from the same source (using different parameters), scramble them again in constant mode, and diff the results. Note that since 'encrypting' uses a stream cipher, if you use the same password for more than one file, an attacker with cryptanalysis background that knows only this piece of information can use it for correlation attacks to get the plain text. Therefore, use different passwords for multiple bibles (like, add the bible name to them), or better, use real encryption like AES instead.

Versification handling

Most Bible formats do not care about versifications (they just store book chapter:verse without caring how many verses a certain chapter has), or support only a single versification (usually KJV or KJVA). Exceptions being SWORD and Logos, which encode the versification mapping in the bible itself. Therefore, this converter usually does not care about versifications; in case a format is limited to a versification, verses will be merged until they fit.

However, there is some support for handling "external" versification mappings (stored in .bmcv files). A Versification tool can be used to import and export versification mappings from different formats, and perform some other modifications (like renaming or deleting versifications, or joining or comparing them). A VersificationDetector export filter can be used to determine which versification in a .bmcv file fits a given bible best. And a VersificationMappedDiffable can be used to "change" the versification of an existing Bible text.

When referring to versification mappings in a file, the general syntax is from/to or from/to/number. Use the latter form (number starts with 1) in case there is more than one mapping stored in your file between the same two versifications. If you use the form without number, but there are multiple mappings, the system will automatically try to find the "best" mapping; i. e. the one that maps more verses and/or maps them more precisely (e. g. a mapping that maps Gen 1:1 to Gen 1:1 and Gen 1:2 to Gen 1:2 is more precise than a mapping that maps Gen 1:1-2 to Gen 1:1-2). In case this cannot be determined, an error message is shown.

When merging mappings from multiple files, you can come into a situation where you have two versifications that represent essentially the same bible, but have different names. Trying to join a path between these two versifications will fail (as there is no common mapping). In this case, you can write name1/name2/-1 to dynamically create an identity mapping: all verses that exist in both versifications automatically map to themselves, all other verses are unmapped.

Supported Versification formats:

  • BMCV (own database format): Import and export (versifications and mappings)
  • KJV (hard-coded KJV versification): "Import" only (no mapping)
  • CCEL: Import and export (versifications and mappings)
  • OpenScriptures: Import only (versifications only, no mappings)
  • SWORDVersification: Import versifications and mappings from SWORD

SWORD import

As the SWORD format is quite complex, I'm using a third party library JSword for parsing it. That library adds quite some footprint to the application (almost 20MB) so SWORD import is only available in a special SWORD edition, which is available as a separate download (but in the same source repository).

SWORD is special in the sense that you do not have a file to import, but a module directory and some bibles in there to import (specified by initials). Just separate those by a slash, and use this as the filename.

In case you do not have a SWORD module directory locally, you can use the SWORDDownloader tool to download some bibles from a SWORD http repository into a new module directory.

E-Sword export

To export for E-Sword, first use the ESwordHTML export filter, which generates two HTML files (.bblx.html and .cmtx.html) which can then imported into ToolTipTool and converted to an E-Sword Bible. As the HTML import filter of ToolTipTool is a bit buggy and nondeterministic (it tends to insert line breaks in the middle of lines, resulting in conversion errors; sometimes it helps to just import the same file again, sometimes not) there is a special "marker" parameter that can be set to anything that does not occur in the bible text ($EOF$ works fine usually). Then import in ToolTipTool NT (dont mind if extra newlines get added), export in ToolTipTool NT as RTF, and run ESwordRTFPostprocessor over the RTF file to fix it. The fixed RTF file can then be imported in ToolTipTool without issues and converted as desired.

[In case you know an easier way to deal with this issue, please tell me :-). If you want to contribute an E-Sword exporter that directly writes the SQL Database files, it will be very much appreciated.]

Logos Bible Software export

As the dependencies are quite large and non-free, this feature is only available in the "Logos Edition" which is a separate binary download (but included in the same source repository). As this format is quite complex (compared to the others), this export is a multi-step process:

First run the LogosVersificationDetector, which will find a verse map for you that covers (hopefully) all verses of your Bible. Then run LogosHTML to produce a HTML file, which you can open in LibreOffice (HTML Writer format) and save as .docx (Office 2007 XML format).

In case your bible contains cross references to books/verses that are not covered by your bible itself, don't forget to pass the -xref option to the versification detector, as Logos will not render datatype links to cross references that do not exist in the verse map (In case you do not get a match that covers both your verses and your xrefs, but there is a match that covers all verses, use that one, as it is better to lose datatype links for your cross references, than to lose some verses).

As LibreOffice has some limitations in exporting of hyperlinks, if your original bible contained Grammar information (Strongs or Morphology tags), you will have to run the resulting DOCX file through LogosNestedHyperlinkPostprocessor or the Grammar information will look broken in Logos.

[In case anybody wants to contribute a Logos exporter that directly writes .docx files, it will be very much appreciated.]