This repository maintains the selection of documents from the EarlyPrint Project that serve as the sources of the EPDraCor corpus.
The EarlyPrint IDs of the selected documents are maintained in the ids.txt file.
To update the selection from a cloned bitbucket repository or from bitbucket directly the update script can be used:
./scripts/update --help
./scripts/update --download
./scripts/update --all --copy /path/to/local/repos
- edit ids.txt to add or remove the EP IDs of the respective EarlyPrint texts
- run
./scripts/update --download
to download new documents from the EarlyPrint Bitbucket repository and/or remove existing documents from thexml
directory - commit the changes
For development purposes this repository provides an eXist DB integration that makes it easy to upload the TEI files into a local eXist database to make them available for xqueries you might want to run for analysis.
To set this up copy the (.env.sample
)[.env.sample] file to .env
and adjust
the environment variables to your local eXist setup. (The defaults should work
with a vanilla eXist DB installation on most systems.) Then run the
init script to create and configure the database collection and
upload xquery files:
cp .env.sample .env
# adjust .env
./scripts/init
Now you can upload either individual TEI files or the entire xml directory using the load script:
# load all files in xml/
./scripts/load
# load individual files
./scripts/load xml/A015*.xml
# usage
./scripts/load --help
Finally an uploaded query can conveniently be executed with the query script:
./scripts/query plays.xq
./scripts/query 'speeches.xq?id=A36645'
To support the integration with editor plugins for
Atom or
Visual Studio Code
we also provide a
template for an .existdb.json
configuration file. The
.existdb.json
gets created when running the init script with the -j
option:
./scripts/init -j