Duplicate URI prefixes error during sssom parse
ehartley opened this issue · 1 comments
sssom parse
gives a Duplicate URI prefixes
error when there are overlapping uri expansions with different prefixes between an input metadata.yml
file and the default prefixes when using the -C merged
option.
For both version 0.3.36 and 0.3.39, running:
sssom parse -I obographs-json -m config/metadata.yml -C merged -F IAO:0100001 -F oboInOwl:consider tmp/obsolete.json -o reports/obsolete.sssom.tsv
produces this error:
Traceback (most recent call last):
File "/usr/local/bin/sssom", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sssom/cli.py", line 215, in parse
parse_file(
File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 93, in parse_file
mapping_predicates = get_list_of_predicate_iri(
File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 192, in get_list_of_predicate_iri
p_iri = extract_iri(p, prefix_map)
File "/usr/local/lib/python3.10/dist-packages/sssom/io.py", line 210, in extract_iri
converter = Converter.from_prefix_map(prefix_map)
File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 551, in from_prefix_map
return cls(
File "/usr/local/lib/python3.10/dist-packages/curies/api.py", line 333, in __init__
raise DuplicateURIPrefixes(duplicate_uri_prefixes)
curies.api.DuplicateURIPrefixes: Duplicate URI prefixes:
http://id.nlm.nih.gov/mesh/:
prefix='MESH' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='mesh' uri_prefix='http://id.nlm.nih.gov/mesh/' prefix_synonyms=[] uri_prefix_synonyms=[]
http://uri.neuinfo.org/nif/nifstd/nlx_subcell_:
prefix='NIF_Subcellular' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='nlx.sub' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_subcell_' prefix_synonyms=[] uri_prefix_synonyms=[]
http://purl.obolibrary.org/obo/EHDAA2_:
prefix='RETIRED_EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='EHDAA2' uri_prefix='http://purl.obolibrary.org/obo/EHDAA2_' prefix_synonyms=[] uri_prefix_synonyms=[]
http://uri.neuinfo.org/nif/nifstd/nlx_anat_:
prefix='NLXANAT' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='nlx.anat' uri_prefix='http://uri.neuinfo.org/nif/nifstd/nlx_anat_' prefix_synonyms=[] uri_prefix_synonyms=[]
http://uri.neuinfo.org/nif/nifstd/birnlex_:
prefix='BIRNLEX' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='birnlex' uri_prefix='http://uri.neuinfo.org/nif/nifstd/birnlex_' prefix_synonyms=[] uri_prefix_synonyms=[]
http://purl.obolibrary.org/obo/NCIT_:
prefix='ncithesaurus' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]
prefix='NCIT' uri_prefix='http://purl.obolibrary.org/obo/NCIT_' prefix_synonyms=[] uri_prefix_synonyms=[]
The metatdata.yml
file defined the MESH
, NIF_Subcellular
, RETIRED_EHDAA2
, NLXANAT
, BIRNLEX
, and ncithesaurus
prefixes. However, the obsolete.json
file being parsed uses both the RETIRED_EHDAA2
and EHDAA2
prefixes and both the ncithesaurus
and NCIT
prefixes.
So, the desired functionality would be to allow multiple prefixes with the same URI. It might be helpful to get a notification about the duplicate URI prefixes, but they shouldn't cause sssom parse
to fail.
This issue is related to #269.
This is a high priority issue to sort out.
The user-supplied metadata always entirely trumps whatever the Bioregistry EPM says. So what needs to happen:
- Ensure that if in metadata-only mode, the converter only uses metadata (I think this is the case)
- Ensure that if in
merged
mode, where some prefixes are supplied by the user and the rest is obtained from thecuries
converter, that
1. the user-supplied prefixes trump the converter prefixes
2. the user-supplied uri-prefixes trump the converter uri-prefixes (this is what I think is going wrong here)
Technically speaking, if the user supplies a pair prefix: http://uri.pre.fix/
then
- all mentions of the
prefix
need to be removed from the context prior to constructing the converter - all mentions of
http://uri.pre.fix/
need to be removed from the context prior to constructing the converter