FM-F1A: Identifier uniqeness
Closed this issue · 7 comments
Per the draft at bdf6f2f
The current text is not easy to apply to RDF data-sources where everything can be explicitly identified. For example the Gene Ontology OWL format uses blank nodes for good OWl reasoner reasons. Blank nodes will always be problematic for this requirement because it states there are resource which will not be given an identifier. The GO OBO format skips this problem because many things are identifier less in this format. e.g. identity is implicit in it's position in the OBO file. This might just be a language issue where resource in the RDF sense is much more specific than what you where aiming for. Similarly in the UniProt RDF and other similar ones we have many more explicit identifiers than the non-RDF sources. As written I think it punishes the formats where everything is a specific independent resource over the classic one where subresources are only contextually identified if at all.
The expectation is that meaninful objects, as opposed to syntactic structures, have identifiers.
The point I am trying to make is that the RDF representations have more meaningful objects than the non RDF ones for the same information. Or differently phrased the non RDF representations don't identify most of the meaningful objects in them. I think the measurement should take this into account. At least discussed in some way that the resource is the classic data resource view not a subresource view.
I think Michel is saying that, if the sub-resources are meaningful outside of the context of that record, then they SHOULD have identifiers.
In any case, the individual doing the FAIR evaluation will make the decision on what resource should be evaluated, so... if they think the sub-resource is not meaningful, they wont test it.
Moreover, if the individual doing the FAIR evaluation decides that the sub-resource is the target of the evaluation, there are other consequences beside the identifier. For instance, that should be metadata associated to that sub-resource instead of to the super-resource as may exist some particular and relevant characteristics only present in the sub-resource.
Jerven, could you provide a piece of text that reflects what you would want the metric to say?
An identifier for a whole should be different from the identifier of the part. e.g. The page of a book should have a different identifier than the whole book, to allow accurately retrieving and citing data. E.g. using ISBN10:0125125127 is not acceptable if you can reasonably use J01388. This extreme example is the first rodent entry in genbank/embl 1986/1987 where both that volume and the entry have identifiers. Identifiers should be specific enough that all reasonable parts are also identifiable.
closing a gen1 metric issue