Allows one to extract grammatical information on glosses from the Würzburg glosses lexicon (Kavanagh 2001).
Starting from the PDF version of the lexicon, one can use pdf2html.py
to convert to HTML (uses PDFMiner), and then cleanhtml.py
to remove tags (uses BeautifulSoup).
After preprocessing, run.py
allows one to extract grammatical information out of the lexicon.
This project comes with a small web application (build in Flask) that allows you to
run the extraction for a single gloss, in case something went wrong during the automatic phase. The web application
can be started by running web.py
.
This work is shared under a BSD 3-Clause licence. See LICENSE for more information.
To cite this repository, please use the metadata provided in CITATION.cff.
Würzburg glosses extraction is developed by Martijn van der Klis and the Research Software Lab at the Centre for Digital Humanities, Utrecht University.
For questions or suggestions, contact the Centre for Digital Humanities or open an issue in this respository.
Kavanagh, Seamus (2001). A lexicon of the Old Irish glosses in the Würzburg Manuscript of the Epistles of St. Paul. Edited by Dagmar S. Wodtko. Österreichische Akademie der Wissenschaften.