Replicator websocket connection timeout is not correctly handled on android
Closed this issue · 3 comments
workingenius commented
The code I read: 7481f1df5edfa1715fe8cb26b5930f38d48392e9 of couchbase-lite-java-common
repo, here
- A AbstractCBLWebSocket object is constructed with
webSocket
member var to benull
, and it is not set untilonOpen
event listener is called when websocket handshake finishes (after the tcp connection establishes). - Couchbase lite core handles connection timeout (with a default value of 15), and it will eventually call
AbstractCBLWebSocket.requestClose
method to close it on timeout is detected. - Look at first several lines of this method,
AbstractCBLWebSocket.requestClose
, there's a shortcut return ifwebSocket
isnull
, which is true when connection timeout happens. Okhttp know nothing about the request to close, it will keep trying (and retry) to connect. - And okhttp timeouts are explicitly set to 0 (because core would take charge), so it hangs for a very long time, maybe forever, even if the bad network recovers. Replicator is stuck in "connecting" state if we check from outside (sometimes "busy").
- okhttp sometimes decides to stop and raise an exception in time, luckily,
onFailure
is called and end all this. (So we can know the failure from outside and restart replication) - There's another possibility that handshake finally succeeds after the core decide to close it because of timeout. I'm not sure what will happen then.
This is our analysis on the problem we encountered. Glad if you take a look.
bmeike commented
Hey! Thanks!
I've filed https://issues.couchbase.com/browse/CBL-1495 to track this.
workingenius commented
@bmeike Please check this pr, couchbase/couchbase-lite-java-common#15, thanks.
bmeike commented
I believe this is fixed in couchbase-lite-java-common @ d2413f29730b1d9a4544244f