Support RO-crate specification
Opened this issue · 4 comments
Read the following for context:
The question for me is what role the RO-crate specification should play in defining the dataset schema in LinkML. From the issue that's linked above:
Currently working on building a LinkML model for a datalad dataset, and trying to figure out if it's better to
(1) first create a linkml model for an RO-crate and then import that into a separate linkml model for a datalad dataset, i.e. making the latter a subset of the former; or
(2) create the linkml model for the datalad dataset and only bring in some properties (i.e. slots) from an ro-crate such that it is compatible as a by-product.
To add to (2): one could build a model that is completely separate from RO-crate and purely follows what we see as the ideal for a datalad dataset metadata structure, and then bring in compatibility with RO-crate as a separate tool, i.e. exporting to the RO-crate specification would be one of many supported "translation" options.
@mih @mslw @christian-monch curious to hear your thoughts on this
Exactly. An ro-crate should be an export of a datalad data model for a single version of a dataset.
Useful context and existing work:
Starting with an effort to model an RO-crate with linkml. It seems the first step would be to decide on a good input representation.
Initially, I thought it would be good to take an RO-crate and frame it with something like
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@type": "http://schema.org/Dataset"
}
to get a hierarchical representation. However, this ruins the deduplicating nature of an RO-crate (array of elementary object definitions, ie. an author person appears only once in a record). Moreover, linkml
IO tooling will strip anything that starts with @
, including @id
-- which is essential in an RO-crate, because it represents the "filename/location" in a dataset.
Maybe it would be better to use something like this
{
"@context": "http://schema.org/",
"@graph": [
{
"id": "ro-crate-metadata.json",
"type": "CreativeWork",
"dct:conformsTo": {
"id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"id": "./"
},
"description": "RO-Crate Metadata File Descriptor (this file)"
},
{
"id": "./",
"type": "Dataset",
"description": "The RO-Crate Root Data Entity",
"hasPart": [
{
"id": "data1.txt"
},
{
"id": "data2.txt"
}
],
"name": "Example RO-Crate"
},
{
"id": "data1.txt",
"type": "MediaObject",
"author": {
"id": "#alice"
},
"contentLocation": {
"id": "http://sws.geonames.org/8152662/"
},
"description": "One of hopefully many Data Entities"
},
{
"id": "data2.txt",
"type": "MediaObject"
},
{
"id": "#alice",
"type": "Person",
"description": "One of hopefully many Contextual Entities",
"name": "Alice"
},
{
"id": "http://sws.geonames.org/8152662/",
"type": "Place",
"name": "Catalina Park"
}
]
}
This is a plain RO-crate passed through JSON-LD compaction with the context
{
"@context": "http://schema.org/"
}
We could now process @graph
only. However, with a complex RO-crate this may not work, because mixing context sources yields something like this:
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {
"@id": "https://w3id.org/ro/crate/1.1"
},
"about": {
"@id": "./"
},
"description": "RO-Crate Metadata File Descriptor (this file)"
},
...
with context
{
"@context": "https://w3id.org/ro/crate/1.1/context"
}
Maybe we need a custom pre-processor...