TEI XML encoded Latin texts from the Laudationes urbium Dalmaticarum collection, freely available under a CC-BY license.
A sample to show database interaction with LiLa.
- The TEI XML texts are in txts directory
- XQuery scripts to create database and transform the files are in xqscripts
- The texts are tokenized into sentences (
s
; in poetry, the verse breaks are removed) and words (w
; the enclitics such as genusque are presented asw
children ofw
parent annotated withana="enclisis"
while the enclitic is encoded with@join="left"
attribute value) - The words are normalized (using the
@norm
attribute) - LiLa URIs are added as
@lemmaRef
attribute values; when a word is missing in LiLa, the@lemmaRef
value points to Logeion
- Download the files or clone the repository.
- Install BaseX (or other XML database)
- From BaseX, open and run the script create-laurdal-db.xq to create the XML database
laurdal-lila
; open the database and inspect it to see that there are no@lemma
attributes inw
elements - Close the
laurdal-lila
database - From Basex, open and run the script query-lila.xq; the script will add
@lemma
attributes to eachw
element in thelaurdal-lila
database; the attribute values are added by querying LiLa online, using the LiLa URIs (for three short files, the script takes 43052.4 ms to execute on my machine) - From BaseX, open the
laurdal-lila
database, inspect it (you should see new@lemma
attributes with their values) or export the updated files into any directory
- Neven Jovanović (nevenjovanovic), Department of Classical Philology, Faculty of Humanities and Social Sciences, University of Zagreb; orcid.org/0000-0002-9119-399X