afs/rdf-delta

Previous not current

Closed this issue ยท 9 comments

Hello. First of all i want thank you for the work done.
When we deploy delta and patchlog in kubernetes we got some warnings
Deploy is simple - zookeeper, one patchlog and one rdf-delta with s3-compatible app - minio
How to fix it?

[2020-03-02 11:41:53] HTTP       WARN  Patch id:1af6b2 : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] HTTP       WARN  Patch id:1ed17d : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] HTTP       WARN  Patch id:af7a96 : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2
....
[2020-03-02 11:51:44] Delta      WARN  Failed to commit: Previous not current: log head=id:f9545e : patch previous=id:b17239
[2020-03-02 11:51:44] HTTP       WARN  Patch id:2748cb : HTTP bad request: Previous not current: log head=id:f9545e : patch previous=id:b17239
[2020-03-02 11:51:44] Delta      WARN  Failed to commit: Previous not current: log head=id:f9545e : patch previous=id:b17239
[2020-03-02 11:51:50] HTTP       WARN  Patch id:6bc627 : HTTP bad request: Previous not current: log head=id:ee8385 : patch previous=id:71bf14

... or even

[2020-03-02 12:03:57] Delta      WARN  [ds:%s %s] Remote version behind local tracking of remote version: [%d, %d] [Ljava.lang.Object;@6c8c2caa
afs commented

Hi @arrrght, this happens when an update is attempted and another update has happened between the last synchronization and the sending the patch. Is this a repeatable?

Deploy is simple - zookeeper, one patchlog and one rdf-delta with s3-compatible app - minio

Could you explain the deployment in more detail please? Is this with multiple Fuseki?

If it is not Fuseki, what's the application application design and what sync policy is it using for the dataset?

The patch log servers - they are running the embedded zookeeper? So there are 3 of them? Or 2 and 3 zookeeper?

Could you list the log with dcmd ls --server http://_hostname_:1066/

RDF Delta is setup for a low update rate and uses a "optimistic concurrency" scheme and clashes result in Previous not current. So two updates are happening at about the same time - if the sync is not frequent enough, this can be a problem.

The consistency and resolution design choice is may be something to change or provide different options for.

The lack of formatting of[%d, %d] log message is now fixed.

Deployment, here:

  • zookeeper on 3 nodes, not embedded
  • 2 fuseki-patchlog servers
  • 2 fuseki-rdf-delta pods
  • an external app, connected to rdf-delta through load balancer as regular fuseki

conf.ttl of fuseki-rdf-delta's:

...
<#dataset> rdf:type delta:DeltaDataset ;
  delta:changes  ("http://fuseki-patchlog:1066/") ;
...

and ls:

root@fuseki-delta-66596564cf-4kknz:/app# ./dcmd ls --server http://fuseki-patchlog:1066
[id:433e56 f2 <http://delta/f2> [ver:1,ver:5965] id:fa8e2a]
[id:aa89cb lukoil2 <http://delta/lukoil2> [ver:1,ver:3524] id:51d9bf]
[id:d08db3 lukoil <http://delta/lukoil> [ver:1,ver:2025] id:e8ccba]
afs commented

My guess is that there is a high update rate happening and because of that m the load balancer is

What I don't understand, if the first logs are from a single server, if why there are several messages with the same "previous". I'd have expect one. If

[2020-03-02 11:41:53] HTTP       WARN  Patch id:1af6b2 : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] HTTP       WARN  Patch id:1ed17d : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] HTTP       WARN  Patch id:af7a96 : HTTP bad request: Previous not current: log head=id:206536 : patch previous=id:f896d2
[2020-03-02 11:41:53] Delta      WARN  Failed to commit: Previous not current: log head=id:206536 : patch previous=id:f896d2

is a combined log things are a bit different.

This might be a bug and needs investigating but isn't the reason the situation starts to happen at all.

What's the pattern of updates at the time?
Are the updates large (in execution time)?

Presumably, the system functions when a burst of updates ends and a single update happens. Is that correct? The log ids have moved on.

If you can make it happen predictably and in a way I can run on my system that would help.

Sorry for the delay, but we are taking steps to emulate requests from our program (NDA, sorry) for direct requests to fuseki.

afs commented

No problem - I have managed to make something similar happen but having 3 fail close together and with two Fuseki servers, is odd. Maybe some of the updates were from the external app?

afs commented

The codebase now has a first implementation of a proper fix for this. The overall effect is that the cluster can only have one active writer at a time and that multiple updates across Fuseki servers are serialized. While that sounds like a reduction in capability, it isn't because (1) a TDB database is one active writer at a time anyway with multiple updates managed internally and (2) multiple writers across Fuseki servers is unreliable - as per this report.

There are some improvements to investigate but they are not changes in the way the multi-server coordination works.

This will become 0.9.0 - there is a protocol change and a 0.8.x server will not work with 0.9.0 client. The reverse mixed version setup - 0.8 client with 0.9.0 server - will work as if a 0.8.0 system - i.e. no fix.

afs commented

Final, better, implementation done. This adds a discretion lock system for adding to a patch log. The clients have timeouts to catch a Fuseki server acquiring the lock and then crashing or the update stalling. The timeouts are set quite long for these unusual circumstances.

This will be in 0.9.0 - sync'ed to Apache Jena 3.17.0.

afs commented

Version 0.9.0 released, sync'ed to Apache Jena 3.17.0. It will propagate to Maven Central.

Note: this requires Java11.