Watch timed out when setting timeout as None
tobegit3hub opened this issue · 5 comments
We have use python-etcd
for leader election. All the workers will watch the same key in etcd
and try to elect the leader after the key dismissed.
Now we try to watch the key and set the timeout as None
.
self.client.watch(self.master_key, timeout=None)
But after almost one minute, the salve worker throws timeout exception and exit.
DEBUG:etcd.client:Watch timed out.
Traceback (most recent call last):
File "./manage.py", line 21, in <module>
execute_from_command_line(sys.argv)
File "/usr/lib64/python2.7/site-packages/django/core/management/__init__.py", line 367, in execute_from_command_line
utility.execute()
File "/usr/lib64/python2.7/site-packages/django/core/management/__init__.py", line 359, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib64/python2.7/site-packages/django/core/management/base.py", line 294, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/lib64/python2.7/site-packages/django/core/management/base.py", line 345, in execute
output = self.handle(*args, **options)
File "/home/work/cloud-ml/restful_server/cloud_ml/management/commands/run_queue_consumer.py", line 743, in handle
etcdLeaderElection.wait_to_become_master()
File "/home/work/cloud-ml/restful_server/utils/leader_election.py", line 26, in wait_to_become_master
self.client.watch(self.master_key, timeout=None)
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 736, in watch
recursive=recursive)
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 562, in read
timeout=timeout)
File "/usr/lib/python2.7/site-packages/etcd/client.py", line 840, in wrapper
cause=e
etcd.EtcdWatchTimedOut: Watch timed out: ReadTimeoutError("HTTPConnectionPool(host='10.105.17.85', port=2379): Read timed out.",)
If we set the timeout as 3600
, it will be much better and will not exit soon. But that's not what we want. Not sure if it's the bug of python-etcd
to watch the key forever.
self.client.watch(self.master_key, timeout=3600)
May be similar to #202
Try to workaround and set timeout like this 😞
import sys
self.client.watch(self.master_key, timeout=sys.maxint)
hi @tobegit3hub I most definitely never had this problem. More specifically, I have a etcd replication tool running in production that watches etcd for hours when I don't set any timeout.
Even my local tests never showed such a behaviour.
So I am at a loss: which version of python/urrlib3/etcd are you using?
Thanks @lavagetto .
It's easy to re-produced in my hosts with CentOS 7.0
, python 2.7.5
, urllib3 1.19.1
and etcd 3.0.15 git sha: fc00305
.
And using timeout=sys.maxint
will work for us.
@tobegit3hub sorry, I just realized that setting no timeout means the default urllib3 read timeout will be enforced.
You should explicitly set the timeout to 0 here:
c = etcd.Client(port=2379)
c.read('/', wait=True) #will cause the timeout error
c.read('/', wait=True, timeout=0) #will wait forever
That makes sense and 0
works for me.
Thanks @lavagetto very much!