Get Diff
tobiasschweizer opened this issue · 3 comments
Hi,
Sorry if this sounds like a stupid question but how exactly do patches help get a diff?
Here is what I did:
I have two RDF files. Let's assume that I am pulling that data from an external source I don't have any control over.
yesterday: test.ttl:
@prefix schema: <http://schema.org/> .
<https://example.com/209350> a schema:ScholarlyArticle .
today: test2.ttl:
@prefix schema: <http://schema.org/> .
<https://example.com/209350> a schema:ScholarlyArticle ;
schema:name "my article" .
I created two patches from them (https://afs.github.io/rdf-delta/cmds.html):
./rdf-delta-cmds/bin/dcmd rdf2patch test.ttl > patch.txt
./rdf-delta-cmds/bin/dcmd rdf2patch test2.ttl > patch2.txt
patch.txt:
H id <uuid:9224eb1a-2166-4ec9-8826-d5491d65a99f> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
TC .
patch2.txt:
H id <uuid:f52dff08-a9e5-404a-ab56-e52c9b93cf42> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
A <https://example.com/209350> <http://schema.org/name> "my article" .
TC .
What I would like to know is that <https://example.com/209350> <http://schema.org/name> "my article"
was added when comparing patch 1 and patch 2 (diff between test.ttl and test2.ttl).
This is what I did:
./rdf-delta-cmds/bin/dcmd patch2rdf patch.txt patch2.txt
@prefix schema: <http://schema.org/> .
<https://example.com/209350>
a schema:ScholarlyArticle ;
schema:name "my article" .
So I think I am missing something fundamental. Is there some example that would clarify this kind of usage?
Thanks a lot and sorry if I got this concept all wrong!
Hi @tobiasschweizer,
An RDF patch is a sequence of operations. One way of generating a patch is to record the add and deletes of triples as a graph i modified in a transaction.
It is an ordered sequence A then D of a triple results in no triple in the modified graph, whereas D then A of a triple results in a triple in the modified graph.
To get from one graph to another, the patch is applied operation by operation.
It's not a diff in the sense of something generated by comparing two graphs.
It might be possible to take a patch file and work out the net effect and produce the equivalent sets of additions and deletions.
rdf2patch is a helper tool to turn a graph into a sequence of additions. Applying that sequence is equivalent to adding the graph into another graph.
Hi @afs,
Thanks a lot for the clarification.
It's not a diff in the sense of something generated by comparing two graphs.
Ok, so this means figuring out what's added or deleted from the graph with a certain transaction, is out of rdf-delta's scope.
What I've been working with so far is rdflib
, see https://www.w3.org/2001/sw/wiki/How_to_diff_RDF#RDFLib.
However, I think this is going to be slow for large datasets.
I have adapted my previous example:
patch.txt for <https://example.com/209350> a schema:ScholarlyArticle .
(the initial graph consists just of this triple)
H id <uuid:b8e8c784-4bf8-4860-a2a9-48d6522979c7> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/ScholarlyArticle> .
TC .
patch2.txt for <https://example.com/209350> <http://schema.org/name> "my article" .
(an additional triple is added to the graph)
H id <uuid:a532f29b-86ad-41c1-9181-f263b7884314> .
TX .
PA "schema" "http://schema.org/" .
A <https://example.com/209350> <http://schema.org/name> "my article" .
TC .
./rdf-delta-cmds/bin/dcmd patch2rdf patch.txt patch2.txt
results in
@prefix schema: <http://schema.org/> .
<https://example.com/209350>
a schema:ScholarlyArticle ;
schema:name "my article" .
So this is like replaying an event log by re-applying the single transactions.
Not sure how to create a delete patch as I have only used this
r2p | rdf2patch | dcmd r2p FILE | Convert RDF to an addition patch
https://afs.github.io/rdf-delta/cmds.html
Thanks again and a have a good week!
Ok, so this means figuring out what's added or deleted from the graph with a certain transaction, is out of rdf-delta's scope.
DatasetGraphChanges
is a wrapper to a DatasetGraph
which in conjunction with RDFChanges
records changes as they happen.
What is out of scope is taking two datasets and calculating the changes without tracking the changes at the time.