quit --waiting command hang up
unkcpz opened this issue · 5 comments
I think there is a critical issue after upgrade to tornado>5 .
circusctl --endpoint=xxx quit --waiting
will successfully stop the process but hang up in terminal.
After trying to debug this, I find the quit action stuck in
Line 142 in 2bedab0
To my surprise, quit
without --waitting
is all fine.
This issue can simply reproduced by running circusd examples/example2.ini
and then stop arbiter with:
circusctl --timeout 10000 --endpoint tcp://127.0.0.1:5555 quit --waiting
I find for command quit
the property waiting
is not actually used when executing quit procedure.
circus/circus/commands/quit.py
Lines 46 to 47 in 2bedab0
Can it be simply removed?
What makes me curious is what's the different between waiting
and not waiting. When does synchronous CircusClient
send the command over socket and who will receive and process the message?
I spent several hours trying to debug this and unfortunately I have no clues why it hangs. It is obvious that there are some events that are not sent and poller is not receiving them, but I don't know which ones...
As for the removal of the waiting
property - it seems that you are right, there is no point of waiting when using synchronous client.
What I am worried about is that even without waiting
I get random zmq errors when circusd stops. Those appeared after upgrade to Tornado 5, ex.:
2020-07-25 23:27:09 asyncio[7773] [ERROR] Exception in callback BaseAsyncIOLoop._handle_events(14, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(14, 1)>
Traceback (most recent call last):
File ".../asyncio/events.py", line 81, in _run
self._context.run(self._callback, *self._args)
File ".../site-packages/tornado/platform/asyncio.py", line 139, in _handle_events
handler_func(fileobj, events)
File ".../site-packages/zmq/eventloop/zmqstream.py", line 452, in _handle_events
zmq_events = self.socket.EVENTS
File ".../site-packages/zmq/sugar/attrsettr.py", line 48, in __getattr__
return self._get_attr_opt(upper_key, opt)
File ".../site-packages/zmq/sugar/attrsettr.py", line 52, in _get_attr_opt
return self.get(opt)
File "zmq/backend/cython/socket.pyx", line 472, in zmq.backend.cython.socket.Socket.get
File "zmq/backend/cython/socket.pyx", line 135, in zmq.backend.cython.socket._check_closed
zmq.error.ZMQError: Socket operation on non-socket
2020-07-25 23:27:09 circus[7773] [INFO] Stats streamer stopped
Thanks very much for the effort! @biozz
I spent several days in this issue 😿
What I am worried about is that even without waiting I get random zmq errors when circusd stops. Those appeared after upgrade to Tornado 5, ex
This happened to me also in tornado<5, so I guess this is not a critical issue. @sphuber Can you double check it in the circus.log
of your aiida environment?
EDIT: this may relevant to this Fix ZeroMQ assertion error in the client.
As for the removal of the waiting property - it seems that you are right, there is no point of waiting when using synchronous client.
If is the case, do you agree to remove it to get around of the issue? I can make a PR for this.
If is the case, do you agree to remove it to get around of the issue? I can make a PR for this.
I agree, it is the best option right now. In my opinion it doesn't affect most of the "simple" usage scenarios, when you just want to quit regardless of waiting.
This happened to me also in tornado<5, so I guess this is not a critical issue. @sphuber Can you double check it in the
circus.log
of your aiida environment?
Sorry for the late reply. Can confirm that I regularly used to get these ZMQErrors even before the move to tornado>=5, so I don't think that is what causes it.