Based on the extensive work of Paul Fièvre, we have been working on a DraCor-ready version of Théâtre Classique. FreDraCor is intended to be a valid TEI P5 resource.
By now, the 1560 files from the source have been structurally cleaned and
converted. We copied some information from the castList
to the particDesc
section and tried to preserve as much as possible.
Besides the fact that all texts are out of copyright, the files (including intellectual work represented as markup) is – according to the source files – licensed under a CC BY NC SA 4.0 licence.
The corpus can be explored at dracor.org/fre.
… we suggest the following:
- French Drama Corpus (FreDraCor): A TEI P5 Version of Paul Fièvre's "Théâtre Classique" Corpus. Edited by Carsten Milling, Frank Fischer and Mathias Göbel. Hosted on GitHub, 2021–. https://github.com/dracor-org/fredracor
Here are, among others, the most significant modifications performed on the original documents:
- add TEI namespace
- add XML declaration
- replace the
teiHeader
with a DraCor-specific version while preserving as much as possible of original content - refine licence statement using current version 3.0 of the given licence and adding URL
- add a
particDesc
- transform the
@xml:id
and@who
attributes into proper IDs and ID references - transform numeral
@id
into@n
ontei:s
andtei:l
- replace
@id
with@corresp
atcastItem/role
- upper-case
tei:l/@part
- remove instances of
tei:l/@syll
(commented) - remove unknown attributes from
tei:role
(commented) - rename
docDate/@value
todocDate/@when
- transform
addresse
element totei:opener/tei:salute
- transform
signature
element totei:signed
- remove empty
@type
- add written and print dates where available
- adjust case of character names (dracor-org#14)
- normalize author names
- add Wikidata IDs for authors and plays (work in progress)
For comprehensive insight into our changes see both the
adjustments
made on the
dracor
branch of the theatre-classique
repository and the tc2dracor.xq transformation script.
Each FreDraCor play is given a DraCor ID (e.g.
fre000784
). These IDs are mapped to the Théâtre
Classique documents in ids.xml. When a new play from Théâtre
Classique is added to the corpus a new ID needs to be assigned and added to
ids.xml
.
To check the current validation status of the corpus against the
tei_all
schema run ./validate
from the root of the repo. (You will need to have
Jing installed for this to work.)
In fact, this script can be used to validate any directory of TEI documents. Just pass the directory as the first argument. For instance, if you have gerdracor checked out next to fredracor, try:
./validate ../gerdracor/tei
For building the FreDraCor documents from the Théâtre Classique sources a scripted workflow has been set up that processes the original files with an XQuery transformation. To speed up the process multiple eXist DB instances can be started in parallel using either Podman or Docker. These are the main steps of this workflow:
- start one or more pods (or containers) running eXist-db
- loading the transformation XQuery
tc2dracor.xq
and auxiliary files (authors.xml, ids.xml) to the database(s) - process each source file by posting it to the transformation XQuery and storing the output to the tei directory
- stop and remove all pods (or containers)
./tc2dracor [options] SOURCE_FILE [SOURCE_FILE...]
The conversion script expects one or more source files as its arguments. These
would typically be files from the xml
directory of the checked out
dracor
branch
of the theatre-classique
repository:
./tc2dracor ../theatre-classique/xml/*.{xml,XML}
NOTE: The dracor
branch of the theatre-classique
repo contains
corrections and amendments to the original source files which the conversion
script relies on but have not (yet) been adopted upstream.
NOTE: For the attribution of DraCor IDs to work, the file names of the source files need to match the ones of the original documents used in ids.xml (see DraCor IDs).
Display usage information and exit.
Number of pods or containers to start. Default: 1
As an alternative to using containers an eXist database already running on
localhost
can be used by passing its port number. With this option the sources
will be copied to the /db/tc2dracor/sources
collection of this database. No
parallel processing will take place.
Directory to write the created TEI files to. Default: ./tei
By default the conversion script uses podman
but falls back to docker
if
podman
is not available. This flag allows you to force the use of docker
when podman
would be available.
The internet does not forget. That's why the script can be run with an optional progress bar shown in the terminal.
Hint: For debugging across multiple containers you may also watch the combined log from every pod can be viewed while the conversion is running:
podman logs -f $(cat $(ls -rtd /tmp/tc2dracor-* | tail -1)/containers)
For debugging purposes the logs of all containers are also stored in a temporary
working directory after the transformation has finished. Use the -v
option to
see the exact location of these files at the end of the script run.
As of now, 76 documents do not comply to the TEI-All schema yet. See the list of open issues for details on this and possible other enhancements.