RMLio/rmlmapper-java

Support for nested data?

paulmillar opened this issue · 7 comments

A little while ago, I came across a very interesting paper, presented at the Knowledge Graph Construction Workshop 2021, called Integrating Nested Data into Knowledge Graphs with RML Fields by Thomas Delva, Dylan Van Assche, Pieter Heyvaert, Ben De Meester and Anastasia Dimou. A recording of Thomas' presentation is also available.

I believe the features that Thomas described are not currently supported in RMLMapper, but could be very useful.

I was wondering if there are any plans to include his work?

Hi!

Nested data is already supported in the form of JSONPath or XPath expressions for JSON and XML data.
The paper you are referring to give more flexibility and aims to be an uniform expression for any kind of nested data.

I believe the features that Thomas described are not currently supported in RMLMapper, but could be very useful.

True, those features were more like a proposal for the community to solve this problem. They are not implemented.
The ideas are currently being discussed in the W3C Knowledge Graph Construction Community Group.

I was wondering if there are any plans to include his work?

Work done on the RMLMapper, RMLStreamer, etc. is solely funded by research projects, the priority of new features are heavily influenced by these projects. Currently, there are no direct plans to implement them, but we welcome any form of collaboration to make this a reality. Feel free to e-mail info@rml.io for collaborations.

Hi @DylanVanAssche ,

Thanks for the very quick and informative reply.

As it happens, my interest in Thomas' work on nested data stems from trying to work with JSON and nested data. Therefore, your comment about how nested data is already possible with JSONPath piqued my interest.

Perhaps I'm missing something (a very real possibility!), but when investigating this, I couldn't see how a JSONPath could work for the data I'm trying to process.

Here's a simplified JSON example, to illustrate what I'm trying to do.

[
 {
   "id": "https://example.org/something",
   "name": "The first thing",
   "addresses": [
     {
       "city": "Canberra"
     },
     {
       "city": "London"
     }
   ]
 },

 {
   "id": "https://example.org/another-thing",
   "name": "The second thing",
   "addresses": [
     {
       "city": "New York"
     }
   ]
 }
]

... and here is the corresponding RDF (without the prefixes), which I'm trying to generate with RMLMapper:

<https://example.org/something> a eg:Thing;
    skos:prefLabel "The first thing";
    eg:hasAddress <https://example.org/something/address-1>;
    eg:hasAddress <https://example.org/something/address-2>.

<https://example.org/something/address-1> a eg:Address;
    eg:city "Canberra".

<https://example.org/something/address-2> a eg:Address;
    eg:city "London".

<https://example.org/another-thing> a eg:Thing;
    skos:prefLabel "The second thing";
    eg:hasAddress <https://example.org/another-thing/address-1>.

<https://example.org/another-thing/address-1> a eg:Address;
    eg:city "New York".

I'm using a simple rml:logicalSource, something like:

  rml:logicalSource [
    rml:source "input.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*]"
  ];

Using this rml:logicalSource, generate the predicates about the top-level items in the JSON (the IRIs of type eg:Thing in the above example) is straight forward.

However, it wasn't clear to me how I could create "new" subject IRIs (e.g., <https://example.org/something/address-1> in the above example) from iterating over a relative JSONPath (e.g., addresses[*] in the above example).

It doesn't seem to be possible with a single rr:TriplesMap. If I've understood correctly, an IRI of type rr:TriplesMap contains exactly one rr:subjectMap predicate, with no possibility of declaring an "inner" rr:TriplesMap.

There also doesn't seem to be possible with multiple rr:TriplesMap IRIs. There doesn't seem to be a way to declare that an IRI of type rr:TriplesMap is (in some sense) "relative" to another IRI of type rr:TriplesMap, so that the first IRI's rml:logicalSource's JSONPath should be executed relative to the second IRI's JSONPath context.

Does this make sense? Am I missing something?

Cheers,
Paul.

Ah yes, you're hitting limitations of JSONPath I'm afraid...
With nested data support in JSONPath I mean that you can map nested data, but as soon you need to link with higher levels in the iterator, you hit a JSONPath limitation as you cannot use go up the in the JSONPath like in XPath can with parent.

For these case you could use Thomas' work yes, if it was implemented.

Was Thomas' work purely theoretical: describing how RML might be extended but without providing an implementation?

Yes, it was a vision on how this problem could be solved to discuss this with the community. There was no implementation.

Ah, OK.

With the data I'm currently working on, this isn't a problem for me (at least, I have a work-around). However, the lack of support for nested data might become a problem in the future: the structure will likely evolve, but it's currently hard to predict in which way.

Although the solution Thomas presented seemed elegant (at least to a non-expert like me), I'm more curious if any solution for handling nested data might become available through RML.

Was there any consensus, from these discussions, on how the community plans to tackle this problem?