German Rhyme Corpus
This is a diachronically balanced sample of German poetry, manually annotated on rhyme.
It is notable that almost 1/3 of stanzas do not rhyme at all, which is often overlooked when building a rhyming corpus (the tendency of a stanza to rhyme depends heavily on the stanza length).
The corpus is in TEI P5 which can be validated with a relaxNG schema that you'll find in the 'Schema' folder.
It's format is based on the conventions used in the German Text Archive (
If you use this corpus please cite the following paper where it is described in detail:
Haider, T., & Kuhn, J. (2018, August). Supervised Rhyme Detection with Siamese Recurrent Networks. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (pp. 81-86).