quickClean
Closed this issue · 0 comments
Primary Actor
Metadata Specialist
Scope
data integration tool preprocessing
Story
It seems no matter how hard we try, we can never really clean our data sufficiently. We would like our integrated workflow interface to test for some of the most commonly encountered "dirtiness" to be cleaned without having to launch another suite of tools. We would like to check if all our IRIs are compliant with applicable RFCs for example; we'd like to run some sort of "internal reconciliation" to see if we've described the same entity with different IRIs. More than anything we don't want the processing to break -- especially if it is a lengthy process -- because of dirty data, so a check for data inconsistencies that specifically may break the processing would be useful. A space following a < in a Turtle URI? An RDF-XML file that is not well formed XML, perhaps because there is a prefix in an attribute? etc etc. At the very least we need clear and precise error reporting when processing is interrupted.