Add tests for integrity of merged contexts
cthoyt opened this issue · 1 comments
#15 introduced data integrity tests, but did not apply them to the merged contexts, because there seems to be some issues with the logic that generates them.
The merged
and merged.oak
strings should be removed from the following (effectively leaving skip = set()
to go along with any updates that it takes to generate merged contexts with bijectiveness guarantees.
If we don't skip these, then
def test_namespace_aliases(self):
"""Test that prefix aliases have a valid namespace."""
will fail, but this is expected.
this is because we have:
merged,sdo,https://schema.org/,namespace_alias
from prefix.cc -- which is junk because, the correct semantic URL uses http not https, so there is no direct deterministic join point with the correct canonical prefix or IRI:
merged,schema,http://schema.org/,canonical
so this is effectively orphaned
now, we could extend the test to do a transitive walk over aliases and use the following as a join point:
merged,schema,https://schema.org/,prefix_alias
but this would be overkill, and not necessary, as the fundamental assumption here doesn't hold - the merged context does have integrity, it provides the correct bijective mapping for schema.org, according to the precedence rules in which the merged context is built (with the junky prefix.cc having lowest priority).
The thing to remember here is that the only thing the API exposes is the canonical mappings, the rest is just there for debugging purposes. It could be argued that a cleaner design would be for the CSVs to only include the canonical mappings for that context, and to put additional anciliary metadata that arises from the ETL elsewhere - a separate issue could be made for this if it's a priority (this would only be a priority for making the library easier to understand, as it wouldn't affect the users of the library)