linkml/prefixmaps

Where should improved `prefixmaps` to `curies.Converter` code live?

cthoyt opened this issue · 3 comments

Right now, there's a pretty substantial loss of content in the examples like

from prefixmaps.io.parser import load_context, load_multi_context
from curies import Converter

ctxt = load_multi_context(["obo", "bioregistry.upper", "linked_data", "prefixcc"])
converter = Converter.from_prefix_map(ctxt.as_dict())

as all prefix and namespace synonyms are thrown away. I've implemented a more fully-featured function for converting a prefixcommons.Context object into a curies.Converter in biopragmatics/curies#22. The question is: should this code live in prefixmaps or curies?

from prefixmaps import load_context
from curies import Converter

# Option 1, if code lives in `curies`
context = load_context("obo")
converter = Converter.from_linkml_context(context)

# Option 2, if code lives in `prefixmaps`
context = load_context("obo")
converter = context.as_converter()

You definitely shouldn't do 1 (use internal ETL methods), 2 is better as it uses methods that are exposed and we can provide guarantees of stability.

as all prefix and namespace synonyms are thrown away.

not seeing the problem here...? I think we may still need to mind meld a bit on the purpose of this library.

The synonyms are useless to the intended consumers of prefixmaps. We deliberately avoid Postel's principle here. You should always use the canonical prefix and canonical namespace, not an alias, when working in any kind of linked data context.

The aliases in the csvs are just these for internal maintainers to more easily view contexts. We could move this out to a separate reporting file, but the csvs are not intended to be seen by anyone other than internal maintainers.

The synonyms are useful for the purpose of prefix normalization, especially in cases where the source is no longer updated so we don't have anyone to ask to correct their IDs

Closed by #25