This corpus comprises sonnets written in Spanish between the 16th and 17th centuries.
Each sonnet has been annotated in XML in accordance with the TEI standard. Besides the header and structural information, each sonnet includes the formal representation of each verse’s particular metrical pattern.
The pattern consists of a sequence of unstressed syllables (represented by the "-" sign) and stressed syllables ("+" sign). Thus, each verse’s metrical pattern is represented as follows:
<l n="1" met="---+---+-+-">Cuando me paro a contemplar mi estado,</l>
With the purpose of having a corpus as representative as possible, every author from the 16th and 17th centuries with more than 10 digitalized and available sonnets has been included.
All texts have been taken from the Biblioteca Virtual Miguel de Cervantes.
Currently, the corpus comprises more than 5,000 sonnets (more than 71,000 verses).
The metrical pattern annotation has been carried out in a semi-automatic way. Firstly, all sonnets have been processed by an automatic metrical scansion system which assigns a distinct metrical pattern to each verse. Secondly, a part of the corpus has been manually checked and errors have been corrected.
Currently the corpus is going through the manual validation phase, and each sonnet includes information about whether it has already been manually checked or not.
If you would like to cite this corpus for academic research purposes, please use this reference:
Navarro-Colorado, Borja; Ribes Lafoz, María, and Sánchez, Noelia (2015) "Metrical annotation of a large corpus of Spanish sonnets: representation, scansion and evaluation" 10th edition of the Language Resources and Evaluation Conference 2016 Portorož, Slovenia. (PDF)
This corpus is part of the ADSO project, developed at the University of Alicante and funded by Fundación BBVA.
If you require further information about the metrical annotation, please consult the Annotation Guide (in Spanish) or the following papers:
-
Navarro-Colorado, Borja; Ribes-Lafoz, María and Sánchez, Noelia (2016) "Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation" Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) Portorož, Slovenia.
-
Navarro-Colorado, Borja (2015) "A computational linguistic approach to Spanish Golden Age Sonnets: metrical and semantic aspects" Computational Linguistics for Literature NAACL 2015, Denver (Co), USA (PDF).
The metrical annotation of this corpus is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License.
About the texts, "this digital object is protected by copyright and/or related rights. This digital object is accessible without charge, but its use is subject to the licensing conditions set by the organization giving access to it. Further information available at http://www.cervantesvirtual.com/marco-legal/ ".