Unhandled exception in connection loop when administrator sets system time in past
serhiinikitenko opened this issue · 1 comments
serhiinikitenko commented
This issue is easy to reproduce, follow steps:
- Start async zookeeper connection as shown below in the snippet
- Change system time few years back
Expected Behavior
Connection restarted
Actual Behavior
Connection stopped
Snippet to Reproduce the Problem
Python 3.8.16 (default, Dec 23 2022, 09:54:58)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from kazoo.client import KazooState, KazooClient
>>> from kazoo.retry import KazooRetry
>>>
>>> host = '127.0.0.1:2181'
>>>
>>> zk = KazooClient(hosts=host,
... connection_retry=KazooRetry(max_tries=-1, delay=0.1, backoff=1.5, max_jitter=0.5, max_delay=5),
... command_retry=KazooRetry(max_tries=1, delay=0.1, backoff=2, max_jitter=0.1, max_delay=1))
>>>
>>> zk.start_async()
<threading.Event object at 0x7f1c86a4cd00>
>>>
>>> zk.get('/')
(b'', ZnodeStat(czxid=0, mzxid=0, ctime=0, mtime=0, version=0, cversion=41, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=7, pzxid=115964117194))
>>>
>>> Unhandled exception in connection loop
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 592, in _connect_attempt
s = self.handler.select([self._socket, self._read_sock],
File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/threading.py", line 172, in select
return selector_select(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 380, in selector_select
ready = selector.select(timeout)
File "/usr/lib64/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
OverflowError: timeout is too large
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 512, in zk_loop
if retry(self._connect_loop, retry) is STOP_CONNECTING:
File "/usr/local/lib/python3.8/site-packages/kazoo/retry.py", line 126, in __call__
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 552, in _connect_loop
status = self._connect_attempt(host, hostip, port, retry)
File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 592, in _connect_attempt
s = self.handler.select([self._socket, self._read_sock],
File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/threading.py", line 172, in select
return selector_select(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 380, in selector_select
ready = selector.select(timeout)
File "/usr/lib64/python3.8/selectors.py", line 468, in select
fd_event_list = self._selector.poll(timeout, max_ev)
OverflowError: timeout is too large
>>> zk.get('/')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/site-packages/kazoo/client.py", line 1165, in get
return self.get_async(path, watch=watch).get()
File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 80, in get
raise self._exception
kazoo.exceptions.ConnectionClosedError: Connection has been closed
>>>
[vagrant@localhost ~]$ date
Tue May 30 08:24:15 UTC 2023
[vagrant@localhost ~]$ sudo timedatectl set-time '2020-06-01'
Logs with logging in DEBUG mode
COPY/PASTE the result of the snippet here (omit any sensitive information)
Specifications
- Kazoo version: 2.9.0
- Result of
pip list
command: - Zookeeper version:
- Zookeeper configuration: put here any useful ZK configuration (authentication, encryption, number of ZK members, number of (concurrent?) clients, Java version, krb5 version, etc.)
- Python version: 3.8
- OS: Rocky Linux 8.7 (Green Obsidian)
StephenSorriaux commented
Hello,
Thank you for the issue.
Looks like a very valid bug as we are using time.now()
to detect how long we have been waiting in the retry logic. When possible we should use time.monotonic
instead to fix this issue.