mapping-commons/sssom-py

Duplicate URI prefixes error during sssom parse

ehartley opened this issue · 1 comments

sssom parse gives a Duplicate URI prefixes error when there are overlapping uri expansions with different prefixes between an input metadata.yml file and the default prefixes when using the -C merged option.

For both version 0.3.36 and 0.3.39, running:

sssom parse -I obographs-json -m config/metadata.yml -C merged -F IAO:0100001 -F oboInOwl:consider tmp/obsolete.json -o reports/obsolete.sssom.tsv

produces this error:

Traceback (most recent call last):
  File "/usr/local/bin/sssom", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/sssom/cli.py", line 215, in parse
    parse_file(
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 93, in parse_file
    mapping_predicates = get_list_of_predicate_iri(
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 192, in get_list_of_predicate_iri
    p_iri = extract_iri(p, prefix_map)
  File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 210, in extract_iri
    converter = Converter.from_prefix_map(prefix_map)
  File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 551, in from_prefix_map
    return cls(
  File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 333, in __init__
    raise DuplicateURIPrefixes(duplicate_uri_prefixes)
curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:

http://id.nlm.nih.gov/mesh/:
        prefix='MESH' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='mesh' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/nlx_subcell_:
        prefix='NIF_Subcellular' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='nlx.sub' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://purl.obolibrary.org/obo/EHDAA2_:
        prefix='RETIRED_EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/nlx_anat_:
        prefix='NLXANAT' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='nlx.anat' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://uri.neuinfo.org/nif/nifstd/birnlex_:
        prefix='BIRNLEX' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='birnlex' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]

http://purl.obolibrary.org/obo/NCIT_:
        prefix='ncithesaurus' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]
        prefix='NCIT' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]

The metatdata.yml file defined the MESH, NIF_Subcellular, RETIRED_EHDAA2, NLXANAT, BIRNLEX, and ncithesaurus prefixes. However, the obsolete.json file being parsed uses both the RETIRED_EHDAA2 and EHDAA2 prefixes and both the ncithesaurus and NCIT prefixes.

So, the desired functionality would be to allow multiple prefixes with the same URI. It might be helpful to get a notification about the duplicate URI prefixes, but they shouldn't cause sssom parse to fail.

This issue is related to #269.

This is a high priority issue to sort out.

The user-supplied metadata always entirely trumps whatever the Bioregistry EPM says. So what needs to happen:

  • Ensure that if in metadata-only mode, the converter only uses metadata (I think this is the case)
  • Ensure that if in merged mode, where some prefixes are supplied by the user and the rest is obtained from the curies converter, that
    1. the user-supplied prefixes trump the converter prefixes
    2. the user-supplied uri-prefixes trump the converter uri-prefixes (this is what I think is going wrong here)

Technically speaking, if the user supplies a pair prefix: http://uri.pre.fix/ then

  1. all mentions of the prefix need to be removed from the context prior to constructing the converter
  2. all mentions of http://uri.pre.fix/ need to be removed from the context prior to constructing the converter