queries are A LOT slower after losing and re-establishing a connection to zookeeper
bsara opened this issue · 20 comments
In the tests that I conducted on my local machine, I was seeing query speeds of ~50ms go to no better than 10 seconds for the same query after re-establishing a lost connection to zookeeper.
Setup
- single rdf-delta (v1.1.2) server utilizes zookeeper and s3 bucket for storage
- at least two rdf-delta-fuseki (v1.1.2) servers running and using the single rdf-delta server
Steps to Reproduce
- Start zookeeper, s3 bucket, rdf-delta, and all rdf-delta-fuseki servers.
- Run the following query against any of the fuseki servers:
You should receive a response very, very quickly (mine usually returned in under 100 ms)
ASK WHERE { ?s ?p ?o }
- Restart the zookeeper server so that the rdf-delta server looses its connection and has to re-establish it.
- After the connection is re-established, submit the query from step 2 to any fuseki server again. The query will take significantly longer to complete and will continue to take a long time to complete until the rdf-delta server is restarted (for me, I couldn't get anything better than a 10 second response time).
RDF Delta Logs
I've added labels to the below log to indicate when each step described above (except for step 1) was performed during the logging.
...3:30,056 DEBUG [main-1] D.Delta : Connection string: zookeeper:2181
...3:30,073 DEBUG [main-1] D.Delta : Delta initialization (level 0)
...3:30,074 DEBUG [main-1] D.Delta : Delta initialization (level 1)
...3:30,077 DEBUG [main-1] D.Delta : Provider: file
...3:30,077 DEBUG [main-1] D.Delta : Provider: rdb
...3:30,077 DEBUG [main-1] D.Delta : Provider: mem
...3:30,077 DEBUG [main-1] D.Delta : Provider: zk
...3:30,078 DEBUG [main-1] D.Delta : Provider: Local
...3:30,083 DEBUG [main-1] D.Config : Setup for provider: zookeeper+s3
...3:31,349 INFO [main-1] work.imps.CuratorFrameworkImpl : Starting
...3:31,355 DEBUG [main-1] curator.CuratorZookeeperClient : Starting
...3:31,355 DEBUG [main-1] apache.curator.ConnectionState : Starting
...3:31,356 DEBUG [main-1] apache.curator.ConnectionState : reset
...3:31,362 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:zookeeper.version=3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
...3:31,362 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:host.name=rdf-delta
...3:31,363 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.version=11.0.14
...3:31,364 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.vendor=Alpine
...3:31,365 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.home=/usr/lib/jvm/java-11-openjdk
...3:31,366 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.class.path=/opt/rdf-delta/delta-server.jar
...3:31,367 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.library.path=/usr/lib/jvm/java-11-openjdk/lib/server:/usr/lib/jvm/java-11-openjdk/lib:/usr/lib/jvm/java-11-openjdk/../lib:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
...3:31,367 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.io.tmpdir=/tmp
...3:31,368 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:java.compiler=<NA>
...3:31,369 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.name=Linux
...3:31,370 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.arch=amd64
...3:31,371 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.version=5.10.104-linuxkit
...3:31,372 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:user.name=root
...3:31,373 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:user.home=/root
...3:31,373 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:user.dir=/opt/rdf-delta
...3:31,374 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.memory.free=992MB
...3:31,374 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.memory.max=2048MB
...3:31,375 INFO [main-1] o.apache.zookeeper.ZooKeeper : Client environment:os.memory.total=1024MB
...3:31,384 INFO [main-1] o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...3:31,394 INFO [main-1] ache.zookeeper.common.X509Util : Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
...3:31,404 INFO [main-1] che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...3:31,414 INFO [main-1] o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...3:31,438 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:31,440 INFO [main-1] work.imps.CuratorFrameworkImpl : Default schema
...3:31,441 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:31,442 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:31,460 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48114, server: zookeeper/172.18.0.2:2181
...3:31,465 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...3:31,471 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x100006632d40003, negotiated timeout = 10000
...3:31,474 DEBUG [main-EventThread-15] apache.curator.ConnectionState : Negotiated session timeout: 10000
...3:31,478 INFO [main-EventThread-15] k.state.ConnectionStateManager : State change: CONNECTED
...3:31,479 DEBUG [main-EventThread-15] work.imps.CuratorFrameworkImpl : Clearing sleep for 1 operations
...3:31,492 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4 replyHeader:: 1,1148278,0 request:: '/zookeeper/config,T response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...3:31,496 INFO [main-EventThread-15] framework.imps.EnsembleTracker : New config event received: {}
...3:31,496 DEBUG [main-EventThread-15] framework.imps.EnsembleTracker : Ignoring new config as it is empty
...3:31,497 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 2,4 replyHeader:: 2,1148278,0 request:: '/zookeeper/config,T response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...3:31,498 INFO [main-EventThread-15] framework.imps.EnsembleTracker : New config event received: {}
...3:31,499 DEBUG [main-EventThread-15] framework.imps.EnsembleTracker : Ignoring new config as it is empty
...3:31,507 INFO [main-1] ocal.patchstor.zk.PatchStoreZk : Connecting to zookeeper
...3:31,513 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 3,3 replyHeader:: 3,1148278,0 request:: '/delta,F response:: s{1148099,1148099,1661472301524,1661472301524,0,3,0,0,0,3,1148109}
...3:31,519 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 4,12 replyHeader:: 4,1148278,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,524 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 5,4 replyHeader:: 5,1148278,0 request:: '/delta/activeLogs/my-log,F response:: #7b202022696422203a202269643a61613862623464392d373139662d346362342d616366622d38353561343532333664373722202c20226e616d6522203a2022636f6c6c6563746976652d6f6e746f6c6f677922202c202275726922203a202264656c74613a636f6c6c6563746976652d6f6e746f6c6f677922207d,s{1148121,1148257,1661472334619,1662141396906,2,0,0,0,124,0,1148121}
...3:31,544 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 6,3 replyHeader:: 6,1148278,0 request:: '/delta/logs/my-log,F response:: s{1148114,1148114,1661472334570,1661472334570,0,6,0,0,0,6,1148120}
...3:31,552 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 7,12 replyHeader:: 7,1148278,0 request:: '/delta/logs/my-log/versions,F response:: v{'00000001},s{1148117,1148117,1661472334599,1661472334599,0,1,0,0,0,1,1148205}
...3:31,555 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 8,4 replyHeader:: 8,1148278,0 request:: '/delta/logs/my-log/versions/00000001,F response:: #fffffff164ffffffb224ffffffbaffffff804dffffffbbffffff9a443f43ffffffe314ffffff85ffffff90,s{1148205,1148205,1661472404031,1661472404031,0,0,0,0,16,0,1148205}
...3:31,558 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 9,4 replyHeader:: 9,1148278,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:31,576 DEBUG [main-1] D.Delta : Register datasource: id:aa8bb4
...3:31,578 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 10,4 replyHeader:: 10,1148278,0 request:: '/delta/activeLogs/other-log,F response:: #7b202022696422203a202269643a65313338366533322d663236382d343465352d383761632d65326637396364376236303322202c20226e616d6522203a2022636d742d6465706c6f7922202c202275726922203a202264656c74613a636d742d6465706c6f7922207d,s{1148134,1148134,1661472335155,1661472335155,0,0,0,0,106,0,1148134}
...3:31,582 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 11,3 replyHeader:: 11,1148278,0 request:: '/delta/logs/other-log,F response:: s{1148127,1148127,1661472335106,1661472335106,0,6,0,0,0,6,1148133}
...3:31,585 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 12,12 replyHeader:: 12,1148278,0 request:: '/delta/logs/other-log/versions,F response:: v{},s{1148130,1148130,1661472335119,1661472335119,0,0,0,0,0,0,1148130}
...3:31,588 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 13,4 replyHeader:: 13,1148278,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...3:31,594 DEBUG [main-1] D.Delta : Register datasource: id:e1386e
...3:31,597 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 14,12 replyHeader:: 14,1148278,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,601 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 15,12 replyHeader:: 15,1148278,0 request:: '/delta/activeLogs,F response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,603 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 16,4 replyHeader:: 16,1148278,0 request:: '/delta/logs/my-log/dsd,F response:: #7b202022696422203a202269643a61613862623464392d373139662d346362342d616366622d38353561343532333664373722202c20226e616d6522203a2022636f6c6c6563746976652d6f6e746f6c6f677922202c202275726922203a202264656c74613a636f6c6c6563746976652d6f6e746f6c6f677922207d,s{1148115,1148115,1661472334574,1661472334574,0,0,0,0,124,0,1148115}
...3:31,607 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 17,4 replyHeader:: 17,1148278,0 request:: '/delta/logs/other-log/dsd,F response:: #7b202022696422203a202269643a65313338366533322d663236382d343465352d383761632d65326637396364376236303322202c20226e616d6522203a2022636d742d6465706c6f7922202c202275726922203a202264656c74613a636d742d6465706c6f7922207d,s{1148128,1148128,1661472335110,1661472335110,0,0,0,0,106,0,1148128}
...3:31,611 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 18,12 replyHeader:: 18,1148278,0 request:: '/delta/activeLogs,F response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,613 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 19,12 replyHeader:: 19,1148278,0 request:: '/delta/activeLogs,F response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,613 INFO [main-1] D.Delta : Provider: zks3
...3:31,615 DEBUG [main-1] D.Delta : [id:aa8bb4, my-log, <delta:my-log>]
...3:31,615 DEBUG [main-1] D.Delta : [id:e1386e, other-log, <delta:other-log>]
...3:31,616 DEBUG [main-1] D.Delta : DataSources: zks3 : [[id:aa8bb4, my-log, <delta:my-log>], [id:e1386e, other-log, <delta:other-log>]]
...3:31,619 INFO [main-1] D.Config : Delta Server jetty config=/opt/rdf-delta/jetty.xml
...3:31,635 INFO [main-1] D.Server : Jetty server config file = /opt/rdf-delta/jetty.xml
...3:31,919 INFO [main-1] D.Delta : RDF Delta 1.1.2 2022-08-17T12:45:57+0000
...3:31,919 DEBUG [main-1] D.Delta : Server start: port=1063
...3:31,921 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 20,12 replyHeader:: 20,1148278,0 request:: '/delta/activeLogs,F response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...3:31,927 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 21,4 replyHeader:: 21,1148278,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...3:31,929 INFO [main-1] D.Delta : Data source: [id:e1386e, other-log, <delta:other-log>] version [0,0] type=zks3
...3:31,932 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 22,4 replyHeader:: 22,1148278,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:31,933 INFO [main-1] D.Delta : Data source: [id:aa8bb4, my-log, <delta:my-log>] version [1,1] type=zks3
...3:35,270 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...3:38,608 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...3:41,946 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...3:44,362 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 23,4 replyHeader:: 23,1148278,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...3:44,363 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
------------------------------------------------------------------------------------------------------------------------------------
Step 2 query happened here
------------------------------------------------------------------------------------------------------------------------------------
...3:44,371 DEBUG [qtp1482087396-23-23] D.RPC : [1] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => { "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...3:47,701 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...3:51,037 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
------------------------------------------------------------------------------------------------------------------------------------
Step 3 zookeeper restart happened here
------------------------------------------------------------------------------------------------------------------------------------
...3:53,710 WARN [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session 0x100006632d40003 for sever zookeeper/172.18.0.2:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x100006632d40003, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) ~[delta-server.jar:?]
...3:54,998 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:54,998 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:54,998 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:56,011 WARN [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session 0x100006632d40003 for sever zookeeper/172.18.0.2:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) ~[?:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:344) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) ~[delta-server.jar:?]
...3:56,012 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.ClientCnxnSocketNIO : Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:160) ~[?:?]
at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:913) ~[?:?]
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:398) ~[?:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:193) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1390) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1331) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1309) ~[delta-server.jar:?]
...3:56,013 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.ClientCnxnSocketNIO : Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at sun.nio.ch.SocketChannelImpl.ensureOpen(SocketChannelImpl.java:160) ~[?:?]
at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:930) ~[?:?]
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:406) ~[?:?]
at org.apache.zookeeper.ClientCnxnSocketNIO.cleanup(ClientCnxnSocketNIO.java:198) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1390) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.cleanAndNotifyState(ClientCnxn.java:1331) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1309) ~[delta-server.jar:?]
...3:56,116 DEBUG [main-EventThread-15] o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /delta/logs/my-log/state
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102) ~[delta-server.jar:?]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316) ~[delta-server.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35) ~[delta-server.jar:?]
at org.seaborne.delta.zk.Zk.zkFetch(Zk.java:155) ~[delta-server.jar:?]
at org.seaborne.delta.zk.Zk.zkFetchJson(Zk.java:138) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.getWatchedState(PatchLogIndexZk.java:288) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncState(PatchLogIndexZk.java:282) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.lambda$new$0(PatchLogIndexZk.java:112) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:588) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[delta-server.jar:?]
...3:57,714 DEBUG [main-SendThread(zookeeper:2181)-14] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...3:57,715 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...3:57,716 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...3:57,717 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48126, server: zookeeper/172.18.0.2:2181
...3:57,718 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...3:57,739 INFO [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x100006632d40003, negotiated timeout = 10000
...3:57,748 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 24,101 replyHeader:: 24,1148278,0 request:: 1148278,v{'/zookeeper/config,'/delta/logs/my-log/state,'/delta/logs/other-log/state},v{},v{'/delta/activeLogs} response:: null
...4:01,081 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 3ms.
...4:02,297 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 25,4 replyHeader:: 25,1148278,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
------------------------------------------------------------------------------------------------------------------------------------
Step 4 queries start happening here
------------------------------------------------------------------------------------------------------------------------------------
...4:02,299 DEBUG [qtp1482087396-18-18] D.RPC : [2] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => { "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:05,614 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...4:06,117 DEBUG [main-EventThread-15] o.apache.curator.RetryLoopImpl : Retrying operation
...4:06,119 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 26,4 replyHeader:: 26,1148278,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:06,131 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x100006632d40003, packet:: clientPath:null serverPath:null finished:false header:: 27,4 replyHeader:: 27,1148278,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:06,132 INFO [main-EventThread-15] k.state.ConnectionStateManager : State change: SUSPENDED
...4:09,470 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...4:12,807 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 1ms.
...4:16,121 DEBUG [main-SendThread(zookeeper:2181)-14] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x100006632d40003 after 2ms.
...4:16,132 WARN [Curator-ConnectionStateManager-0-13] k.state.ConnectionStateManager : Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10000. Adjusted session timeout ms: 10000
...4:16,134 INFO [Curator-ConnectionStateManager-0-13] he.zookeeper.ZooKeeperTestable : injectSessionExpiration() called
...4:16,237 DEBUG [main-EventThread-15] o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /delta/activeLogs
at org.apache.zookeeper.KeeperException.create(KeeperException.java:134) ~[delta-server.jar:?]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2857) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231) ~[delta-server.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchStoreZk.getWatchLogs(PatchStoreZk.java:187) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchStoreZk.lambda$new$1(PatchStoreZk.java:120) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:77) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:588) ~[delta-server.jar:?]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) ~[delta-server.jar:?]
...4:21,180 DEBUG [qtp1482087396-24-24] o.apache.curator.RetryLoopImpl : Retry-able exception received
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /delta/logs/my-log/state
at org.apache.zookeeper.KeeperException.create(KeeperException.java:134) ~[delta-server.jar:?]
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[delta-server.jar:?]
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:2358) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:327) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl$4.call(GetDataBuilderImpl.java:316) ~[delta-server.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.pathInForeground(GetDataBuilderImpl.java:313) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:304) ~[delta-server.jar:?]
at org.apache.curator.framework.imps.GetDataBuilderImpl.forPath(GetDataBuilderImpl.java:35) ~[delta-server.jar:?]
at org.seaborne.delta.zk.Zk.zkFetch(Zk.java:155) ~[delta-server.jar:?]
at org.seaborne.delta.zk.Zk.zkFetchJson(Zk.java:138) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.getWatchedState(PatchLogIndexZk.java:288) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncState(PatchLogIndexZk.java:282) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.zk.PatchLogIndexZk.syncVersionInfo(PatchLogIndexZk.java:293) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.patchstores.PatchLogBase.getInfo(PatchLogBase.java:111) ~[delta-server.jar:?]
at org.seaborne.delta.server.local.DeltaLinkLocal.getPatchLogInfo(DeltaLinkLocal.java:195) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.S_DRPC.describePatchLog(S_DRPC.java:325) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.S_DRPC.executeAction(S_DRPC.java:159) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.DeltaServlet.doCommon(DeltaServlet.java:147) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.S_DRPC.doPost(S_DRPC.java:60) ~[delta-server.jar:?]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:523) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.DeltaServlet.service(DeltaServlet.java:87) ~[delta-server.jar:?]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:764) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1630) ~[delta-server.jar:?]
at org.seaborne.delta.server.http.F_PatchFilter.doFilter(F_PatchFilter.java:77) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176) ~[delta-server.jar:?]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1300) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129) ~[delta-server.jar:?]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122) ~[delta-server.jar:?]
at org.eclipse.jetty.server.Server.handle(Server.java:562) ~[delta-server.jar:?]
at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505) ~[delta-server.jar:?]
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762) ~[delta-server.jar:?]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497) ~[delta-server.jar:?]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282) ~[delta-server.jar:?]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319) ~[delta-server.jar:?]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[delta-server.jar:?]
at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:530) ~[delta-server.jar:?]
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:379) ~[delta-server.jar:?]
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:146) ~[delta-server.jar:?]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100) ~[delta-server.jar:?]
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894) ~[delta-server.jar:?]
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038) ~[delta-server.jar:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
...4:26,133 WARN [Curator-ConnectionStateManager-0-13] k.state.ConnectionStateManager : Session timeout has elapsed while SUSPENDED. Injecting a session expiration. Elapsed ms: 10001. Adjusted session timeout ms: 10000
...4:26,134 DEBUG [Curator-ConnectionStateManager-0-13] apache.curator.ConnectionState : reset
...4:26,134 DEBUG [Curator-ConnectionStateManager-0-13] o.apache.zookeeper.ZooKeeper : Close called on already closed client
...4:26,134 INFO [Curator-ConnectionStateManager-0-13] o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...4:26,135 INFO [Curator-ConnectionStateManager-0-13] che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...4:26,136 INFO [Curator-ConnectionStateManager-0-13] o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...4:26,139 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...4:26,139 INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...4:26,140 INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...4:26,140 INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48140, server: zookeeper/172.18.0.2:2181
...4:26,141 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...4:26,147 INFO [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x1000069b68a0000, negotiated timeout = 10000
...4:26,147 DEBUG [Curator-ConnectionStateManager-0-EventThread-29] apache.curator.ConnectionState : Negotiated session timeout: 10000
...4:26,147 INFO [Curator-ConnectionStateManager-0-EventThread-29] k.state.ConnectionStateManager : State change: LOST
...4:26,147 INFO [Curator-ConnectionStateManager-0-EventThread-29] k.state.ConnectionStateManager : State change: RECONNECTED
...4:26,147 DEBUG [Curator-ConnectionStateManager-0-EventThread-29] work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,149 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 1,4 replyHeader:: 1,1148279,0 request:: '/zookeeper/config,T response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...4:26,150 INFO [Curator-ConnectionStateManager-0-EventThread-29] framework.imps.EnsembleTracker : New config event received: {}
...4:26,150 DEBUG [Curator-ConnectionStateManager-0-EventThread-29] framework.imps.EnsembleTracker : Ignoring new config as it is empty
...4:26,237 DEBUG [main-EventThread-15] o.apache.curator.RetryLoopImpl : Retrying operation
...4:26,240 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 2,12 replyHeader:: 2,1148279,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,243 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 3,4 replyHeader:: 3,1148279,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,245 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 4,4 replyHeader:: 4,1148279,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,246 DEBUG [main-EventThread-15] work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,248 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 5,12 replyHeader:: 5,1148279,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,250 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 6,4 replyHeader:: 6,1148279,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,252 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 7,4 replyHeader:: 7,1148279,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,253 WARN [main-EventThread-15] apache.curator.ConnectionState : Session expired event received
...4:26,253 DEBUG [main-EventThread-15] apache.curator.ConnectionState : reset
...4:26,253 DEBUG [main-EventThread-15] o.apache.zookeeper.ZooKeeper : Closing session: 0x1000069b68a0000
...4:26,253 DEBUG [main-EventThread-15] o.apache.zookeeper.ClientCnxn : Closing client for session: 0x1000069b68a0000
...4:26,255 DEBUG [Curator-ConnectionStateManager-0-SendThread(zookeeper:2181)-28] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0000, packet:: clientPath:null serverPath:null finished:false header:: 8,-11 replyHeader:: 8,1148280,0 request:: null response:: null
...4:26,256 DEBUG [main-EventThread-15] o.apache.zookeeper.ClientCnxn : Disconnecting client for session: 0x1000069b68a0000
...4:26,358 INFO [main-EventThread-15] o.apache.zookeeper.ZooKeeper : Session: 0x1000069b68a0000 closed
...4:26,358 INFO [main-EventThread-15] o.apache.zookeeper.ZooKeeper : Initiating client connection, connectString=zookeeper:2181 sessionTimeout=10000 watcher=org.apache.curator.ConnectionState@771a7d53
...4:26,358 INFO [main-EventThread-15] che.zookeeper.ClientCnxnSocket : jute.maxbuffer value is 1048575 Bytes
...4:26,360 INFO [main-EventThread-15] o.apache.zookeeper.ClientCnxn : zookeeper.request.timeout value is 0. feature enabled=false
...4:26,361 INFO [main-EventThread-15] k.state.ConnectionStateManager : State change: LOST
...4:26,362 DEBUG [main-SendThread(zookeeper:2181)-30] .zookeeper.SaslServerPrincipal : Canonicalized address to zookeeper.localdev
...4:26,362 INFO [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Opening socket connection to server zookeeper/172.18.0.2:2181.
...4:26,362 INFO [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : SASL config status: Will not attempt to authenticate using SASL (unknown error)
...4:26,362 INFO [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /172.18.0.6:48142, server: zookeeper/172.18.0.2:2181
...4:26,363 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Session establishment request sent on zookeeper/172.18.0.2:2181
...4:26,368 INFO [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Session establishment complete on server zookeeper/172.18.0.2:2181, session id = 0x1000069b68a0001, negotiated timeout = 10000
...4:26,369 DEBUG [main-EventThread-31] apache.curator.ConnectionState : Negotiated session timeout: 10000
...4:26,369 INFO [main-EventThread-31] k.state.ConnectionStateManager : State change: RECONNECTED
...4:26,370 DEBUG [main-EventThread-31] work.imps.CuratorFrameworkImpl : Clearing sleep for 0 operations
...4:26,374 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 1,12 replyHeader:: 1,1148281,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,375 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:/zookeeper/config serverPath:/zookeeper/config finished:false header:: 2,4 replyHeader:: 2,1148281,0 request:: '/zookeeper/config,T response:: ,s{0,0,0,0,0,0,-1,0,0,0,0}
...4:26,376 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 3,4 replyHeader:: 3,1148281,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,380 INFO [main-EventThread-31] framework.imps.EnsembleTracker : New config event received: {}
...4:26,381 DEBUG [main-EventThread-31] framework.imps.EnsembleTracker : Ignoring new config as it is empty
...4:26,381 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 4,4 replyHeader:: 4,1148281,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:26,383 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 5,4 replyHeader:: 5,1148281,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:26,386 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 6,12 replyHeader:: 6,1148281,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:26,386 INFO [Curator-ConnectionStateManager-0-EventThread-29] o.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x1000069b68a0000
...4:26,387 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 7,4 replyHeader:: 7,1148281,0 request:: '/delta/logs/other-log/state,T response:: #7b2276657273696f6e22203a2030207d,s{1148129,1148129,1661472335115,1661472335115,0,0,0,0,16,0,1148129}
...4:29,724 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:31,183 DEBUG [qtp1482087396-24-24] o.apache.curator.RetryLoopImpl : Retrying operation
...4:31,186 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 8,4 replyHeader:: 8,1148282,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:31,188 DEBUG [qtp1482087396-24-24] D.RPC : [3] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => { "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:34,524 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:36,389 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 9,12 replyHeader:: 9,1148282,0 request:: '/delta/activeLogs,T response:: v{'my-log,'other-log},s{1148109,1148109,1661472301621,1661472301621,0,2,0,0,0,2,1148134}
...4:36,389 INFO [main-EventThread-15] o.apache.zookeeper.ClientCnxn : EventThread shut down for session: 0x100006632d40003
...4:39,725 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:43,061 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:44,602 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 10,4 replyHeader:: 10,1148282,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:44,604 DEBUG [qtp1482087396-21-21] D.RPC : [4] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => { "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
...4:47,939 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:51,277 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 2ms.
...4:54,613 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Got ping response for session id: 0x1000069b68a0001 after 1ms.
...4:56,799 DEBUG [main-SendThread(zookeeper:2181)-30] o.apache.zookeeper.ClientCnxn : Reading reply session id: 0x1000069b68a0001, packet:: clientPath:null serverPath:null finished:false header:: 11,4 replyHeader:: 11,1148282,0 request:: '/delta/logs/my-log/state,T response:: #7b20202276657273696f6e22203a2031202c2022696422203a202266313634623232342d626138302d346462622d396134342d33663433653331343835393022207d,s{1148116,1148206,1661472334592,1661472404035,1,0,0,0,66,0,1148116}
...4:56,800 DEBUG [qtp1482087396-24-24] D.RPC : [5] describe_log {"datasource" : "aa8bb4d9-719f-4cb4-acfb-855a45236d77" } => { "id" : "id:aa8bb4d9-719f-4cb4-acfb-855a45236d77" , "name" : "my-log" , "uri" : "delta:my-log" , "min_version" : 1 , "max_version" : 1 , "latest" : "id:f164b224-ba80-4dbb-9a44-3f43e3148590" }
Maybe what you are seeing is possibly due to the time it takes for zookeeper server to be ready to respond to it's client requests.
A ZK server can take a significantly long time to be ready. It is not just a matter of starting the server. But during this time, it's networking is running so at the low level, it clients can connect. They don't get a response though until the server is ready.
The time a ZK server it takes to come up to speed gets longer, the longer the cluster has been in operation - it replays the ZK log. There is also a leader election.
One approach I have seen is to add a server to the cluster, making N+1, then moving a server (e.g for maintenance). The cluster can still operate. Simpy restarting a server drops the cluster to below the critical threshold which causes knock-on effects.
Maybe what you are seeing is possibly due to the time it takes for zookeeper server to be ready to respond to it's client requests.
I've seen the slowness regardless of waiting several minutes or immediately after the zookeeper server is back up and running. Also, at initial startup, I don't see the slowness at all. It only occurs AFTER the rdf-delta server has connected to the zookeeper, then loses that connection and re-establishes it. No amount of time is changing the slowness of the responses from what I'm seeing.
Why did the zookeeper server restart?
I'm not sure what is going on them - I'm don't know which part of the system is pausing.
Is there always a pause? Is it always the same length of time?
Is it always less than a certain amount (e.g. 2 minutes)? That suggest a TCP level timeout.
Maybe there is stale TCP connection (not properly shutdown). The patch server doesn't know the original zookeeper has gone until usage is attempted and then fails.
When the Fuseki server makes a query - is it a query or an update? Updates involve more ZK interaction.
Why did the zookeeper server restart?
I'm not exactly sure. Regardless, the main point is that a connection is lost for one reason or another to zookeeper.
Is there always a pause?
After a connection to zookeeper is re-established, yes. After that connection is re-established and then the entire rdf-delta server is restarted, no. A full restart of the rdf-delta server always fixes the issue.
Is it always the same length of time?
It seems to be. I was clocking the times to complete the simple ask query that I mention in the original post at 10 seconds very consistently.
That suggest a TCP level timeout.
There isn't a timeout on the rdf-delta side, it does complete eventually, it just takes a super long time to complete after the zookeeper connection is re-established.
When the Fuseki server makes a query - is it a query or an update?
It is a query. The only request sent to rdf-delta is the log description RPC call to determine if any new patches are available.
To me, all of the evidence I'm seeing points to the issue being inside the rdf-delta server somewhere (or an api that it is using in the curator libs); especially given the fact that the only way that I've found to resolve the issue is to restart the rdf-delta server itself. I wonder if there are connections that, when closed, aren't actually disposed of or something like that? I really have no idea though. My knowledge about interacting with zookeeper, let alone via curator, is extremely limited.
Loss of a ZK server is supposed to be a rare event.
There are several 10s time related constants:
Zk.createCuratorClient:
RetryPolicy policy = new ExponentialBackoffRetry(10000, 5);
CuratorFramework client =
CuratorFrameworkFactory.builder()
.connectString(connectString)
.sessionTimeoutMs(10000)
.connectionTimeoutMs(10000)
...
Loss of a ZK server is supposed to be a rare event.
I can say that the loss of connection is definitely rare for us, but the few times where it has happened, it has created some serious issues.
Those 10s constants are a good point. That could definitely point us in the right direction.
Also, not all of the zookeeper instances have gone down when the loss of connection occurs, it seems that it will happen if only one of our more than two instances drops...I'll have to double check though.
speaking of those timeouts, it would be really nice to be able to customize them via dcmd.
if only one of our more than two instances drops
I'm not sure what that means.
I'm not sure what that means.
Sorry, I mean that we have more than 2 zookeeper servers running and that if one of them is "lost" momentarily, then this seems to happen.
I hope you have at least 3!
You do have a SPOF at the single patch log server. As these are locally stateless, you won't loose data but can have a service interruption for updates.
So far:
Some sort of of pause when the ZK server changes is to be expected.
The pause seems to be of a length determined by the Curator setup.
When you say "and will continue to take a long time to complete", any queries in the switchover window will have a delay.
Are queries issued serially or overlapping?
If a sequence of queries is sent, waiting for each to complete before sending the next, so queries 2 onwards experience the delay? Even after several minutes? (this is probing the working of the Curator/Zookeeper libraries.)
I hope you have at least 3
Ha. Yes, we do.
You do have a SPOF at the single patch log server.
That is understood.
Are queries issued serially or overlapping?
In our test and production environments, they are overlapping. However, when I did my local testing, I did a single query at a time and waited for a response before submitting another and got the same results (I.E. the 10 second delay in the responses).
As we do run into the same problem, we did some tests and came to the following conclusions:
- The retry policy does not kick in at all.
- In most cases. the connection timeout is the one determining the time.
- In rare cases, the session timeout is added on top of the connection timeout.
- With the timeouts set to 10 secs, the system does not recover from the long answer times, not even after a couple of hours. However, when reducing the connection timeout to 100 ms, the system is able to recover in most cases. However, we are unsure about the implications of such a short timeout.
The tests were done with queries issued serially and some random times between the queries.
Hi @oliverbrandt -- thank you very much for the information; it helps to have another environment reporting.
In what environment are you running? Is it a cloud platform?
May be relevant:
Apache Curator 5.4.0 has just been released. It includes an upgrade of the ZooKeeper dependency from 3.6.3 (As in RDF Delta 1.1.2) to 3.7.1.
The tests to determine the effect of different timeouts were done locally.
The experiences with long running instances have been made on Azure Kubernetes Service.
@bsara @oliverbrandt -- In #154, the use of load balancer confused ZooKeeper. Do either of you use load balancers in front of the ZooKeeper servers?
Azure question: What is being used for the object blob storage? #154 uses MinIO in front of the Azure Blog Store in a k8s environment. It's probably unrelated but it's always useful to know about the deployment setup.
We did configure the components to directly address the ZKs instead of going through a service.
For the blob storage, MinIO is used as a frontend to the Azure Blob Storage.
@bsara's deployment does not have a load balancer in front of zookeeper.