/wurzburg-glosses-extraction

Extracts lemmata, forms and loci from the Würzburg glosses

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Würzburg glosses extraction

DOI

Allows one to extract grammatical information on glosses from the Würzburg glosses lexicon (Kavanagh 2001).

Usage

Starting from the PDF version of the lexicon, one can use pdf2html.py to convert to HTML (uses PDFMiner), and then cleanhtml.py to remove tags (uses BeautifulSoup).

After preprocessing, run.py allows one to extract grammatical information out of the lexicon.

Web application

This project comes with a small web application (build in Flask) that allows you to run the extraction for a single gloss, in case something went wrong during the automatic phase. The web application can be started by running web.py.

Licence

This work is shared under a BSD 3-Clause licence. See LICENSE for more information.

Citation

To cite this repository, please use the metadata provided in CITATION.cff.

Contact

Würzburg glosses extraction is developed by Martijn van der Klis and the Research Software Lab at the Centre for Digital Humanities, Utrecht University.

For questions or suggestions, contact the Centre for Digital Humanities or open an issue in this respository.

References

Kavanagh, Seamus (2001). A lexicon of the Old Irish glosses in the Würzburg Manuscript of the Epistles of St. Paul. Edited by Dagmar S. Wodtko. Österreichische Akademie der Wissenschaften.