Define the isamples project IRI policy
Opened this issue · 6 comments
metadata/isamples_metadata/Transformer.py
Line 24 in ad51e70
The iSamples project will make extensive use of IRIs for referencing resources such as vocabularies, contexts, metadata, and so forth.
A policy for IRI structure needs to be put in place to ensure logical construction. In particular, IRIs should be reliably dereferenceable to the target resource.
Note also that IRI comparison differs between uses. For example, IRI comparison in the context of RDF processing uses literal string comparison [1, 2]. IRI comparison in the context of resource access involves determining if two IRIs will resolve to the same resource, and hence some canonicalization is applicable [3].
See also #24
[1]: § 6.4. RDF URI Reference
[2]: RFC3987 § 5.3.1. simple String Comparison
[3]: RFC3987 § 5.a. Equivalence
Use w3id to manage namespaces. It provides flexibility for change and works effectively for many other groups with similar requirements.
Resources needing resolvable namespaces:
Schemas, ontologies, vocabularies
: Generally hosted on github, served through githack or preferably via isamples.github.io
Records
: Each record uniquely addressable through @id
. May need iSamples service for rendering if requested format(+profile) is not available on origin
Documentation
: Static documentation for project and components based at isamples.github.io
Note from JK on versioning:
extraversioned: a version identifier is separate from the id string, so that the actionable id does not lead to specific version without human intervention, e.g., “http://doi.org/10.2345/67”, Version 4.
intraversioned: a version identifier is part of the id string, e.g., “http://doi.org/10.2345/67.V4”.
introversioned: a kind of intraversioned content for which the version identifier (within the object identifier) is opaque, e.g., “http://doi.org/10.2345/678”, which happens to be version 4.
We had many discussions about this for the MIxS standard (see GenomicsStandardsConsortium/mixs#233). In the end, we went with a structure similar to what is used for some OBO ontologies (GenomicsStandardsConsortium/mixs#233 (comment)).
Here are some corresponding suggestions for iSamples
Schemas, ontologies, vocabularies (i.e. structure collections of terms)
: Generally hosted on github, served through githack or preferably via isamples.github.io
CORE:
https://w3id.org/isamples/schema/ - resolves to most recent version of schema
https://w3id.org/isamples/schema would resolve to the home page for the schema
https://w3id.org/isamples/schema/####### would resolve to individual term pages, where ####### is an opaque ID for each term.
We may want to pick a string other than "schema".
https://w3id.org/isamples/yyyymmdd/schema/- resolves specific versions
Domain specific packages:
https://w3id.org/isamples/schema/package_name - resolves to current version
https://w3id.org/isamples/yyyymmdd/schema/package_name/ - resolves specific versions
Use opaque IDs for package names.
Documentation
If we use LinkML for the schemas, it will automatically generate documentation pages which we can host independently (e.g., isamplesschema.org) or as part of our website.
Records
: Each record uniquely addressable through @id. May need iSamples service for rendering if requested format(+profile) is not available on origin
We need to have a discussion about what this entails. By record, does it mean one record per sample? If we create a separate ID for the record, how does that align with what DiSSCo is doing?
in terms of w3id implementation, having the version id string embedded in the uri (e.g. https://w3id.org/isamples/yyyymmdd/schema) is much easier to implement. Also note that its handy and user-friendly to allow constructs like https://w3id.org/isamples/yyyymmdd/schema.ttl, https://w3id.org/isamples/yyyymmdd/schema.txt, https://w3id.org/isamples/yyyymmdd/schema.html to get specific representations without having to implement content negotiation.
In the DiSSCo approach, there is a separate identifier for the metadata record ('digitalSpecimen') and for the actual specimen-- materialSample in our current thinking (keep in mind the the social science view of 'sample' as a subset of some population, not a physical entity). I think we can take the approach that the metadata record ('digitalSpecimen') is the online representation of an entity in the world that can't be transmitted over the wire (https://en.wikipedia.org/wiki/HTTPRange-14), so resolving a sample identifier gets the metadata record ('digitalSpecimen') for the sample; resolving the URI for the metadata record gets the same thing, but without a 303 redirect.