Implement sorting
Opened this issue · 0 comments
Graph data has no natural sort order, so NQuads to HOWL conversion currently results in random ordering of subject and statements. We should add consistent, idempotent sorting methods for HOWL output. Sorting should be optional, but we might decide to sort by default for certain conversions.
The sort order has to make sense for the human reader, so she can quickly find the information she's looking for in the file. I think we have to divide the sort into "tiers".
Some predicates are more important than others. I prefer rdfs:label
and rdf:types
to come immediately after the subject, and other users will have different preferences. The easiest way I can think of for users to specify preferred predicate order is by specifying the LABELs for the predicates in the preferred order. So we should track the order of LABELs and use that as the first tier for soring statements.
The custom order is most important for predicates. I don't know if it should also apply to subjects and graphs. If not, then the next tier should be alphanumeric by label. It would be better (but more expensive) to use natural number sorting for labels.
The next tiers should be alphanumeric prefixed names, then alphanumeric IRIs, then alphanumeric blank nodes.
-
graphs: default graph first, then named graphs in tiers: labels, prefixed names, IRIs
-
subjects: labels, prefixed names, IRIs, blank nodes
-
statements: LABELs, prefixed names, IRIs; then by object's lexical value (keeping annotations together).
We could also offer a simpler sort order: alphanumeric by IRI, with blank nodes last.