RMLio/yarrrml-parser

Cross-product of links between instances

namedgraph opened this issue ยท 18 comments

Issue type: ๐Ÿ› Bug

sources:
  Smth:
    access: input.ndjson
    referenceFormulation: jsonpath
    iterator: "$.key[*]"
mappings:
  Concept:
    sources: Smth
    graph: smth:$(@)
    s: smth:$(@)#this
    po:
      - [ a, skos:Concept ]
      - [ skos:prefLabel, $(@) ]
  Document:
    sources: Smth
    graph: smth:$(@)
    s: smth:$(@)
    po:
      - [ a, foaf:Document ]
      - p: foaf:primaryTopic
        o:
          - mapping: Concept

I am getting a cross-product in the output, i.e. if there are N rows, I'm getting this Document output:

<instance1> foaf:primaryTopic <instance1#this> .
<instance1> foaf:primaryTopic <instance2#this> .
...
<instance1> foaf:primaryTopic <instanceN#this> .
<instance2> foaf:primaryTopic <instance1#this> .
<instance2> foaf:primaryTopic <instance2#this> .
...
<instance2> foaf:primaryTopic <instanceN#this> .
...
<instanceN> foaf:primaryTopic <instance1#this> .
<instanceN> foaf:primaryTopic <instance2#this> .
...
<instanceN> foaf:primaryTopic <instanceN#this> .

Where I only want to simply "pair" respective Document and Concept instances:

<instance1> foaf:primaryTopic <instance1#this> .
<instance2> foaf:primaryTopic <instance2#this> .
...
<instanceN> foaf:primaryTopic <instanceN#this> .

Is there a way to express what I need with YARRRML?

Hi @namedgraph, how does your input data look like?

One row looks like this:

{"key":["aaaaaa","bbbbbbbb","cccc","ddddddd"]}

I added sources to the mapping BTW.

It's only possible if there is a unique way to identify a row and the only link the rows that are the same. For example, if the every row has a index you can add a condition to your mapping so that it only links when the indexes of the rows as equal.

I see... I might need to add that.

These are annoying shortcomings IMO. It would not be a problem using XSLT, for example.

But wait... I would need a different iterator then?

No, that is not needed.

But in the mapping I'm using array item (e.g. "cccc") as the value: $(@)
It's those values I need to compare, not row IDs. If I was to add IDs for those values, I would need to change the whole JSON structure within the array?

No, that's not needed. The following works for me

sources:
  Smth:
    access: data.json
    referenceFormulation: jsonpath
    iterator: "$.key[*]"
mappings:
  Concept:
    sources: Smth
    s: ex:$(@)#this
    po:
      - [ a, skos:Concept ]
      - [ skos:prefLabel, $(@) ]
  Document:
    sources: Smth
    s: ex:$(@)
    po:
      - [ a, foaf:Document ]
      - p: foaf:primaryTopic
        o:
          - mapping: Concept
            condition:
              function: equal
              parameters:
                - [str1, $(@)]
                - [str2, $(@)]

Thanks! Will try.

I considered this, but it wasn't obvious to me how comparing $(@) to $(@) could ever be false?

We compare every element in the array (via Concept) with every element in the array (via Document). So we have

  • aaa and aaa: this is what you want, so true.
  • aaa and bbb: you don't want this link, so false.
  • aaa and ccc: again false
  • bbb and aaa: false
  • bbb and bbb: true, we want to link these.

That part I understand. But doesn't that mean that $(@) in str1 refers to a different value than $(@) in str2?

In str1 we refer to the rows in Document and in str2 we refer to the rows in Concept.

That explains the result, and it will be useful in my case. But my point is that it's counter-intuitive and unusual for the same variable ($(@)) to refer to different values in the same context.

I just tried your suggestion with condition and it doesn't work for me -- the result is the same as without it.
Are you sure you tested with more than one row of JSONL?

Well, it's not the same context actually, but for equal the context has a default if the user doesn't provide one. This is explained here:

But when a condition is used an extra value can be given to a parameter of a function. This is either s or o. s means that the value of the parameter is coming from the subject of the relationship, while o means that the value is coming from the object of the relationship. The default value is s. In this example it would result in relationships between every person and their projects.

Regarding JSONL, by default only standard JSON is supported.

Disregard the JSONL comment...

I managed to reproduce your condition-based results using the rmlio/rmlmapper-java Docker image, but not in the Java code (using be.ugent.rml:rmlmapper:6.1.3) ๐Ÿค”

Turns out it's a bug in our custom executor ๐Ÿ˜… Sorry for the noise.