self._attempts of KazooRetry will nerver grow up more than 1 when ConnectionDropped exception occur repeatedly in _connect_attempt

Question

self._attempts of KazooRetry will nerver grow up more than 1 when ConnectionDropped exception occur repeatedly in _connect_attempt

thenumberouscode opened this issue 2 years ago · 4 comments

Expected Behavior

when I set max_tries = 3 , I found that the retry still can be infinity. Does I misunderstand the meaning of max_tries?
I think self._attempts should be grow up when retry actually occur.

Actual Behavior

now self._attempts is always change between 0 and 1 when ConnectionDropped exception occur repeatedly in _connect_attempt.

Snippet to Reproduce the Problem

COPY/PASTE the snippet here (omit any sensitive information)

Logs with logging in DEBUG mode

COPY/PASTE the result of the snippet here (omit any sensitive information)

Specifications

Kazoo version: 2.9.0
Result of pip list command:
Zookeeper version:
Zookeeper configuration: put here any useful ZK configuration (authentication, encryption, number of ZK members, number of (concurrent?) clients, Java version, krb5 version, etc.)
Python version:
OS:

Answer 1 · 2022-12-23T05:32:27.000Z

I handwrite a ConnectionDroped exception in connection.py line 587

Answer 2 · 2022-12-23T12:35:55.000Z

then I set max_tries=3, the log indicated that client retried many times.
KazooClient(self.hosts, auth_data=[("digest", self.auth)], connection_retry={"max_tries":"3", "delay": 0.1, "backoff": 2, "max_delay": 3600}, logger=logger)

Answer 3 · 2023-02-19T00:10:42.000Z

Have you noticed this happening in the wild?

If the server was in a bad state, self._connect would fail which is only a few lines above.

Looking at the code, this would only cause this pathological case only if the server would always respond successfully to a connect but fail on a ping.

I guess the assumption would be that after a connection drop in the ping loop, we want to immediately reconnect after which either the problem got solved or the _connect call would throw which happens before the retry.reset() call that results in the status you notice.

IMO I think this assumption holds for the lifetime of a client, especially considering that failure in a _connect_attempt would result in the next connection being attempted on the next host in the list, which hopefully has a high chance of succeeding.

Answer 4 · 2023-07-21T11:26:34.000Z

Yes, I think your understanding of the code logic is reasonable, the issue can be closed.