Statement on data source: The raw dataset is generously provided by Aaron Griffith, in the .XLSX file extension.
The dataset for the current research is created by manually encoding each gloss according to the features outlined in the previous section. It consists of 174 observations by row and 64 variables by column, totaling 11,136 cases. Of the 174 observations, 74 are glosses mainly by hand A (from 24a1 to 26b17), and the remaining 100 are predominantly by hand B (from 65b6 to 67b21). The 64 variables are generated by the exhaustive combination of 16 linguistic or palaeographic features with four age strata. Each defined feature, such as the -o/-a endings from the i-/u-stems, is combined with each of the four age strata, resulting in four variables for each feature, thus giving rise to the 64 variables. It should be noted that the dataset does not include the contents of the glosses, but only presents their “gloss number” (65b6, 24a1 etc.).
- 'Pavia Glosses.xlsx' file is the raw dataset without any modification.
- 'Pavia Glosses_no missing_weighted.xlsx' file is the complete weighted dataset where all the missing values are properly treated, plus two additional columns containing information for weighting.