nfdi4plants/arc-to-rocrate

Pragmatic RO-Crate Export

Closed this issue · 3 comments

Due to missing vocabulary in schema.org, it might be advantageous to also have a "pragmatic" RO-Crate export, which does not map the deeper parts of the ISA model like the process sequence, but only adds the most interesting metadata (like factors, characteristics) to the assays and studies.

The output object could look like this:

flowchart TD

inv[Investigation<br>- description<br>- name<br>- ...]

Assay[Assay<br>- filename<br>- comments]
Person
dots[...]
dots2[...]
files[File]
props[factors,characteristics,...]

inv --hasPart--> Assay
inv --creator--> Person
inv --> dots
Assay --hasPart--> files
Assay --> dots2
Assay --about--> props

Loading

This stand in contrast to a more complex mapping like this:

flowchart TD

inv[Investigation<br>- description<br>- name<br>- ...]

Assay[Assay<br>- filename<br>- comments]
Person
dots[...]
dots2[...]
files[File]
procs[ProcessSequence]
proc1[Process 1]
proc2[Process 2]
prot[Protocol<br>- characteristics<br>- ...]
sample1[Sample 1]
sample2[Sample 2]

inv --hasPart--> Assay
inv --creator--> Person
inv --> dots
Assay --hasPart--> files
Assay --> dots2
Assay --?--> procs
procs --?--> proc1
procs --?--> proc2
proc1 --next--> proc2
proc1 --protocol--> prot
proc1 --input--> sample1
proc1 --"output"--> sample2
proc2 --input--> sample2
proc2 --"output"--> files

Loading

One very intuitive example for such a strategy can be found in the mapping between an isa:Person and an sdo:Person. Here, both types include a property affiliation, but the property type differs between the two vocabularies. In ISA, the affiliation is a string, whereas in schema.org, the affiliation is an object of type sdo:Organization. So there is a perfect semantic mapping, but the syntax doesn't fit.

However, in this example, the problem can be fixed by bloating the ISA property to an object, just containing the name. So, we keep a very straight-forward semantic mapping by performing a more loose structural mapping.

Similar techniques can also be applied to the process sequence and its related objects. Here, we would shrink this part to common instead of extending it.

For the following original ISA-Json person

{
  "@id": "Persons/LukasWeil",
  "@type": "Person",
  "firstName": "Lukas",
  "lastName": "Weil",
  "affiliation": "Universiteee"
}

the resulting RO-Crate complying schema markup would look like this:

{
  "@id": "Persons/LukasWeil",
  "@type": "Person",
  "firstName": "Lukas",
  "lastName": "Weil",
  "affiliation": {
    "@type": "Organization",
    "@id": "Organization/Universiteee",
    "name": "Universiteee",
    "@context": {
      "sdo": "http://schema.org/",
      "Organization": "sdo:Organization",
      "name": "sdo:name"
    }
  },
  "@context": {
    "sdo": "http://schema.org/",
    "Person": "sdo:Person",
    "firstName": "sdo:givenName",
    "lastName": "sdo:familyName",
    "affiliation": "sdo:affiliation"
  }
}
HLWeil commented

The RO-Crate (and therefore also the ARC json-ld) should be designed in a way to allow complete (without loss) import into the ARC datamodel. So the ARC json-ld should cover all logical connections that are important for the ARC.

This would be a hard constraint, by which we should make our decision between semantic and structural mapping. Your affilitation example from above would work with this principle. For the rest I think we would be closer to the non-pragmatic solution.

After some work, the non-pramatic approach was implemented in https://github.com/nfdi4plants/isa-ro-crate-profile.

The problem with the processes was circumvented by creating the LabProcess type.

The general problems of mappability were covered by writing a dedicated parser, which not only uses isa-json+context but actually writes an RO-Crate json