draeger-lab/ModelPolisher

Handling ncbigi annotations

mephenor opened this issue · 5 comments

ncbigi annotations are no longer supported by identifiers.org.
As far as i understood, the direct URL https://www.ncbi.nlm.nih.gov/protein/ could still be used to resolve them.
Should this be implemented?

Is there a clear reason why the identifiers.org team does no longer support this kind of IDs? I'd suggest contacting them and ask. As a general remark, annotation in SBML is not restricted to the use of identifiers.org references. Any valid URL can be attached to a qualifiers within a controlled-vocabulary term. The only advantage of the identifiers.org is that their IDs follow a common structure and that the maintainers guarantee stable "resolveability."

I haven't asked the identifiers team yet, as the apparent reason is described in one of the posts at https://ncbiinsights.ncbi.nlm.nih.gov/tag/gi/, where it is mentioned that

[...] more and more new sequence records will not be assigned a GI number, and so will never be retrievable using GI methods. But records that currently have a GI will always have that GI.

and accession.version should be used instead. They also link to the original announcement regarding this change https://www.ncbi.nlm.nih.gov/books/NBK431010/#news_03-02-2016-phase-out-of-GI-numbers .

For now I've kept the ncbigi URL as is, as they can still be resolved, event thought they are not supported anymore and the pattern cannot be validated.

Or would it be possible to obtain the corresponding accession.version entry for a given GI? If so, we could use that...

The important thing for identifiers.org links is that they are matching the defined regular expression patterns in the MIRIAM registry. As long as this is the case they are valid.
If it is not possible to do this it is from my perspective perfectly fine to use direct URLs (they have the caveat of not being resolvable at some point, but it is much better then putting an invalid identifiers.org identifier for a resource.