/german-rhyme-corpus

German Rhyme Corpus Diachronically balanced

German Rhyme Corpus

This is a diachronically balanced sample of German poetry, manually annotated on rhyme.

It is notable that almost 1/3 of stanzas do not rhyme at all, which is often overlooked when building a rhyming corpus (the tendency of a stanza to rhyme depends heavily on the stanza length).

The corpus is in TEI P5 which can be validated with a relaxNG schema that you'll find in the 'Schema' folder.

It's format is based on the conventions used in the German Text Archive (deutschestextarchiv.de).

If you use this corpus please cite the following paper where it is described in detail:

Haider, T., & Kuhn, J. (2018, August). Supervised Rhyme Detection with Siamese Recurrent Networks. In Proceedings of the Second Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (pp. 81-86). https://www.aclweb.org/anthology/W18-4509/