translate TranskribusMetadata into v>=2018 MetadataItem
bertsky opened this issue · 2 comments
The valuable information does not have to be removed. Transforming not just the attributes, but also its recursive Property
elements into MetadataItem Labels is worthwhile IMO.
I believe the schema under https://github.com/Transkribus/TranskribusPageformat/blob/master/pagecontent_extension.xsd is not up to date anymore. The current version seems to be under https://gitlab.com/readcoop/transkribus/TranskribusCore/-/blob/master/src/main/resources/xsd/pagecontent_extension.xsd
Hi, @bertsky;
It may be true that there is valuable information in the attribute, but because the allocation does not work with controlled values, it is difficult to convert them into real page values. For this reason, I would ignore them for now.
key value documentation:
key of the property - this could pre specified keys like "lang", "layout", "year_from", "year_to", "style", "weight" or meta data from the automatic process "editor", "editordate" or user defined properties "numbering"
comment: There are suggestions for keys in the documentation. The author or the system does not have to follow these.
Sure, without an actual schema it's risky to rely on these values. But at least for humans this is always useful. And there are no restrictions for names or text under PAGE's MetadataItem/Labels/Label
element.
And once you accept the risk of being incomplete or becoming outdated when parsing them into standard metadata formats, it might still be worth trying, if ambitious. (Merely converting them to syntactically allowed representation under MetadataItem
would only be a prerequisite to that additional goal.)