afs/rdf-delta

Get Diff

tobiasschweizer opened this issue · 3 comments

Hi,

Sorry if this sounds like a stupid question but how exactly do patches help get a diff?

Here is what I did:

I have two RDF files. Let's assume that I am pulling that data from an external source I don't have any control over.

yesterday: test.ttl:

@prefix schema: <http://schema.org/> .

<https://example.com/209350> a schema:ScholarlyArticle .

today: test2.ttl:

@prefix schema: <http://schema.org/> .

<https://example.com/209350> a schema:ScholarlyArticle ;
      schema:name "my article" .

I created two patches from them (https://afs.github.io/rdf-delta/cmds.html):

  • ./rdf-delta-cmds/bin/dcmd rdf2patch test.ttl > patch.txt
  • ./rdf-delta-cmds/bin/dcmd rdf2patch test2.ttl > patch2.txt

patch.txt:

H id <uuid:9224eb1a-2166-4ec9-8826-d5491d65a99f> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
TC .

patch2.txt:

H id <uuid:f52dff08-a9e5-404a-ab56-e52c9b93cf42> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
A <https://example.com/209350> <http://schema.org/name> "my article" .
TC .

What I would like to know is that <https://example.com/209350> <http://schema.org/name> "my article" was added when comparing patch 1 and patch 2 (diff between test.ttl and test2.ttl).

This is what I did:
./rdf-delta-cmds/bin/dcmd patch2rdf patch.txt patch2.txt

@prefix schema: <http://schema.org/> .

<https://example.com/209350>
        a            schema:ScholarlyArticle ;
        schema:name  "my article" .

So I think I am missing something fundamental. Is there some example that would clarify this kind of usage?

Thanks a lot and sorry if I got this concept all wrong!

afs commented

Hi @tobiasschweizer,

An RDF patch is a sequence of operations. One way of generating a patch is to record the add and deletes of triples as a graph i modified in a transaction.

It is an ordered sequence A then D of a triple results in no triple in the modified graph, whereas D then A of a triple results in a triple in the modified graph.

To get from one graph to another, the patch is applied operation by operation.

It's not a diff in the sense of something generated by comparing two graphs.

It might be possible to take a patch file and work out the net effect and produce the equivalent sets of additions and deletions.

rdf2patch is a helper tool to turn a graph into a sequence of additions. Applying that sequence is equivalent to adding the graph into another graph.

Hi @afs,

Thanks a lot for the clarification.

It's not a diff in the sense of something generated by comparing two graphs.

Ok, so this means figuring out what's added or deleted from the graph with a certain transaction, is out of rdf-delta's scope.
What I've been working with so far is rdflib, see https://www.w3.org/2001/sw/wiki/How_to_diff_RDF#RDFLib.
However, I think this is going to be slow for large datasets.

I have adapted my previous example:

patch.txt for <https://example.com/209350> a schema:ScholarlyArticle . (the initial graph consists just of this triple)

H id <uuid:b8e8c784-4bf8-4860-a2a9-48d6522979c7> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
TC .

patch2.txt for <https://example.com/209350> <http://schema.org/name> "my article" . (an additional triple is added to the graph)

H id <uuid:a532f29b-86ad-41c1-9181-f263b7884314> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://schema.org/name> "my article" .
TC .

./rdf-delta-cmds/bin/dcmd patch2rdf patch.txt patch2.txt results in

@prefix schema: <http://schema.org/> .

<https://example.com/209350>
        a            schema:ScholarlyArticle ;
        schema:name  "my article" .

So this is like replaying an event log by re-applying the single transactions.

Not sure how to create a delete patch as I have only used this

r2p | rdf2patch | dcmd r2p FILE | Convert RDF to an addition patch

https://afs.github.io/rdf-delta/cmds.html

Thanks again and a have a good week!

afs commented

Ok, so this means figuring out what's added or deleted from the graph with a certain transaction, is out of rdf-delta's scope.

DatasetGraphChanges is a wrapper to a DatasetGraph which in conjunction with RDFChanges records changes as they happen.

What is out of scope is taking two datasets and calculating the changes without tracking the changes at the time.