afs/rdf-delta

Should there be an RDF representation for an RDF Patch?

Closed this issue · 3 comments

This is really interesting!

I was trying to do something similar but more general (send a SPARQL-like query to an extended Fuseki server; get an HTTP connection that stays open and informs you of the results of your query, updated "continuously" as changes come in).

The protocol I was getting to the point of playing with would:

  1. Use RDF (in any encoding) rather than a specific textual format like RDF Patch.
  2. Describe changes to a SPARQL-like results set, rather than all triples.

I think both of these would make the applications I'm aiming for much easier to write (in JavaScript, probably): instead of parsing RDF Patch, I could decode JSON chunks that describe an ever-growing RDF graph which encodes how my result set changed.

I think one way of thinking about this would be to "expand" RDF Patch to an RDF graph that describes how a specific query's (the universal "SELECT * WHERE { ?s ?p ?o }" query's) result set changed. I was thinking about something like this

_:tx a patch:transaction .
_:tx patch:adds _:binding .
_:binding patch:lets [ patch:variable "s" ; patch:value some:subject ] .
_:binding patch:lets [ patch:variable "p" ; patch:value some:predicate ] .
_:binding patch:lets [ patch:variable "o" ; patch:value some:object ] .

That's more verbose than RDF Patch, but I think the generality is worth it.

afs commented

Interesting - tracking changes to a result set would provide a form of continuous query and also conditional events so useful in various ways.

The abstraction in RDF Patch of adds and deletes fits that model though I don't think that one syntax can do everything. Too many use cases directly supported compromises efficiency as well as maintainability.

An RDF representation of RDF patch would be interesting - whether as a direct translation of the base form, or to make it more suitable for specific use cases like your example of a web client where JSON/JSON-LD would be the delivery form.

RDF Patch's goals include keeping two dataset exactly in-step - the exact same results from a query on any replica, which means keeping the blank nodes aligned.

No RDF format can do that - RDF Patch is explicitly not an RDF syntax whatever it might look like. It is also for server usage and efficiency matters.

RDF Patch streams on writing as changes are made and on reading a patch so it works for 100+ millions of triples changing in one transaction.

(I have been meaning to write down a "context" section and also about design choices so let me start ...)

RDF "Structures"

Many proposals for higher level structure in RDF like RDF Containers (alt/seq/bag), RDF collections (lists) all have one problem - what if the structure is written wrongly? Checking requires reading the whole graph (the point where the blank node syntax label closes) before you can say for certain "this RDF list is good". This is not good for processing at scale.

What happen if there is a second

_:binding patch:lets [ patch:variable "s" ; patch:value some:otherSubject ] .

is in the change? Or is missing? To make a good client library, checking is important because asking the user to be careful isn't great.

Other approaches to change propagation

There have been other examples of change formats and IMO each is addressing different use cases. Each works for its use case and the lesson I take away is that one solution does not meet all usage even at the functional level.

(Not an exhaustive list - some are same approach as one of the above. What examples of different approaches are there?)

Thanks for your response! That's a very good explanation of why the RDF-Patch-in-RDF proposal is problematic, and indeed I hadn't thought enough about blank nodes in particular.

I'm not quite sure I follow your argument about checking the RDF graph for validity; with an RDF Patch log, you can't tell that you're not actually getting garbage until you've completed reading it, so how is that different from the RDF case?

afs commented

At the level of adding a triple, RDF patch is a single entry:

A <http://example/s> <http://example/p> "object" .

That can't break without it being a syntax parse error. No amount of other stuff in the patch can break it. It can undo it or make it silly RDF - higher level issue - but as a change description it is closed and can be executed immediately the "." is seen.

With an RDF description of a structure, you need an exact set of several triples. Until you see all the triples, you don't know there aren't erroneous extra ones and if the triples go through a hash map, may not be coming out in a some niec order of all _:binding together. Unfortunately, at scale on the web, bad things do happen.

Turtle syntax with (...) helps a lot for RDF lists but it does not provide a guarantee. Users prefer toolkits to notice. The data consumer may not have any control over the production of the content.