mapping-commons/sssom

Conversion to JSON-LD is not working as expected

Opened this issue · 6 comments

When converting the SSSOM table to JSON-LD, the slots defined as EntityReference have the range rdfs:Resource as the datatype. So for JSON-LD, it's just another string that doesn't recognise the prefixes.
In these cases, it should have the @type: @id. For this, the range in the schema should be uri.

Here is a piece of the JSON-LD @context

"@context": {
    ...
    "subject_id": {
        "@type": "rdfs:Resource",
        "@id": "owl:annotatedSource"
      },
    "predicate_id": {
        "@type": "rdfs:Resource",
        "@id": "owl:annotatedProperty"
      },
    "object_id": {
        "@type": "rdfs:Resource",
        "@id": "owl:annotatedTarget"
      },
    "mapping_justification": {
        "@type": "rdfs:Resource"
      },
    ...
}
    

Here is how it's loaded in RDF4J

Screenshot 2022-11-25 at 12 00 19

After changing the type to @id (except for subject_id), this is how it looks in RDF4J

Screenshot 2022-11-22 at 16 17 34

There's also another issue with Mapping. It's not adding the @type: Mapping for each mapping. In the @context, there's the definition of Mapping.

"@context": {
    ...
    "Mapping": {
       "@id": "owl:Axiom"
    }
    ...
}
"mappings": [
      {
        "subject_id": "MP:0001289",
        "predicate_id": "skos:closeMatch",
        "object_id": "HP:0007968",
        "mapping_justification": "semapv:LexicalMatching",
        "subject_label": "persistence of hyaloid vascular system",
        "object_label": "Remnants of the hyaloid vascular system",
      },
      {
        "subject_id": "MP:0001293",
        "predicate_id": "skos:exactMatch",
        "object_id": "HP:0000528",
        "mapping_justification": "semapv:LexicalMatching",
        "subject_label": "anophthalmia",
        "object_label": "Anophthalmia",
      },
      {
        "subject_id": "MP:0001303",
        "predicate_id": "skos:closeMatch",
        "object_id": "HP:0000517",
        "mapping_justification": "semapv:LexicalMatching",
        "subject_label": "abnormal lens morphology",
        "object_label": "Abnormality of the lens",
      }
]

Excellent analysis, this makes sense. Can you move this issue here please: https://github.com/linkml/linkml/issues
and cc me?

The root of this problem can be found in the sssom_schema Entityreference type:

 EntityReference:
    typeof: uriorcurie
    description: A reference to a mapped entity. This is represented internally as a string, and as a resource in RDF
    base: str
    uri: rdfs:Resource

The issues here include:

  1. There is no rdfs:Resource URI. As noted above, the reference should be rdf:Resource
  2. The definition doesn't align with the type itself -- typeof: uriorcurie declares it to be URI or a CURIE, but we then instruct it to override the output type in the output.
  3. The uri and uriorcurie types identify Literals of type xsd:anyURI. What we actually want here, is a reference to
    an "Entity" (i.e. class). I'll poke around with the model a bit, but what I suspect you actually want is:
classes:
   Entity:
slots:
   subject_id:
    description: The ID of the subject of the mapping.
    range: Entity
    required: true

I'll give this approach a try and will get back to you.

Thank you @hsolbrig, do you think if we simply do #244 it will be fine?

EDIT: Unfortunately only the old solution using rdfs:Resource produces the expected results, during the TTL conversion.

Unfortunately only the old solution using rdfs:Resource produces the expected results, during the TTL conversion.

There is no rdfs:Resource URI. As noted above, the reference should be rdf:Resource

See here for a Stack Overflow on the subject.

Here is my attempt to implement your suggestion, but I keep getting obscure errors like:

  File "/Users/matentzn/Library/Caches/pypoetry/virtualenvs/sssom-schema-tmWuLoCj-py3.10/lib/python3.10/site-packages/linkml/utils/converter.py", line 140, in cli
    obj = loader.load(source=input, target_class=py_target_class, **inargs)
  File "/Users/matentzn/Library/Caches/pypoetry/virtualenvs/sssom-schema-tmWuLoCj-py3.10/lib/python3.10/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 255, in load
    objs = self.from_rdf_graph(g, schemaview=schemaview, target_class=target_class, prefix_map=prefix_map, **kwargs)
  File "/Users/matentzn/Library/Caches/pypoetry/virtualenvs/sssom-schema-tmWuLoCj-py3.10/lib/python3.10/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 127, in from_rdf_graph
    v = self._uri_to_id(o, id_slot, schemaview)
  File "/Users/matentzn/Library/Caches/pypoetry/virtualenvs/sssom-schema-tmWuLoCj-py3.10/lib/python3.10/site-packages/linkml_runtime/loaders/rdflib_loader.py", line 220, in _uri_to_id
    if schemaview.is_type_percent_encoded(id_slot.range):
AttributeError: 'NoneType' object has no attribute 'range'
make: *** [tests/output/out.json] Error 1

@hrshdhgd I dropped the ball on this. Can you try to understand what exactly this issue is about and how #244 is related, and what needs to be done to finish this?

High (but not very high) priority.