mapping-commons/sssom

Docs: Are built-in prefixes in metadata required, or recommended?

Closed this issue · 3 comments

Overview

Just a request to have documentation on this question:
Are built-in prefixes in metadata required, or recommended?

Additional info

Thoughts
I prefer to be minimalist to the point that I want prefixes in my metadata if and only if they appear in mapping set / any of the rows in my TSV. But I do see the UX benefits of including common built-ins.

Related

@gouttegd has already documented this here:

https://mapping-commons.github.io/sssom/spec/#tsv

The YAML metadata block MUST contain a curie map that allows the unambiguous interpretation of CURIES. A curie map is supplied after a curie_map: parameter in the yaml file. The value is a dictionary of CURIE->URLPREFIX pairs. Note that the following prefixes are built-in and (1) MUST NOT be changed from their SSSOM default interpretation and (2) MAY be omitted from the curie map: "sssom", "owl", "rdf", "rdfs", "skos", "semapv".

So, you are right, they are optional and sssom-py is a bit obsessive adding these optionals.

I plan to clarify that even further in my upcoming¹ overhaul of the spec by defining a canonical TSV serialisation.

The canonical serialisation is the serialisation that SSSOM/TSV writers will be recommended to adopt in order to minimise serialisation differences across implementations, to avoid possibly huge and meaningless diffs when a SSSOM/TSV file is modified by different tools. It’s a generalisation of the logic by which we are already recommending that TSV columns should be written in a spec-defined order.

Regarding the prefix map, the “canonical” guideline will be that SSSOM/TSV writers should write the minimal effective prefix map – that is, the prefix map should only contain prefix names that (1) are not already built-in, and (2) are effectively used somewhere in the mapping set.

But that’s only a recommendation for writers. For SSSOM/TSV readers, all that matters is that (1) there are no prefix names in the set that are not declared in the prefix map, with the exception of the built-in prefix names, and (2) if the built-in prefix names are declared, they must point to the same prefixes as in the spec.

So it is never wrong for a set to have a prefix map that contains superfluous prefix names (names that are built-in and/or that are never used in the set), and readers must never reject a set because of that. It is simply recommended that writers avoid generating such maps.

--
¹ Yes, it is coming up. At some point. Eventually.

Thanks for the helpful responses! This is already done, then! Closing.