Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.
MODS is the de-facto standard for encoding bibliographic
meta data in libraries. It is usually included as a separate section into
METS XML files. Physical and logical structure of a document
are expressed in terms of structural mappings (structMap
elements).
TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.
Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.
mets-mods2tei
is developed at the Saxon State and University Library in Dresden.
mets-mods2tei
is implemented in Python 3. In the following, we assume a working Python 3
(tested versions 3.5, 3.6 and 3.7) installation.
The first installation step is the cloning of the repository:
$ git clone https://github.com/wrznr/mets-mods2tei.git
$ cd mets-mods2tei
Using virtualenv
is highly recommended, although not strictly
necessary for installing mets-mods2tei
. It may be installed via:
$ [sudo] pip install virtualenv
Create a virtual environement in a subdirectory of your choice (e.g. env
) using
$ virtualenv -p python3 env
and activate it.
$ . env/bin/activate
mets-mods2tei
can be installed via pip
:
(env) $ pip install .
mets-mods2tei
uses pytest
-based testing.
Install the test requirements:
(env) pip install -r requirements-test.txt
Run the tests via:
(env) $ pytest
Determine code coverage by running
(env) $ make coverage
Installing mets-mods2tei
makes the command line tool mm2tei
available:
(env) $ mm2tei --help
Usage: mm2tei [OPTIONS] METS
METS: File containing or URL pointing to the METS/MODS XML to be converted
Options:
-o, --ocr Serialize OCR into resulting TEI
-l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
--help Show this message and exit.
It reads METS XML via URL or file argument and prints the resulting TEI including the extracted information from the MODS part of the METS.
(env) $ mm2tei "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"