deadlocks in `Channel.call(...)`
jxsl13 opened this issue · 4 comments
Describe the bug
Hi,
I'm (still) developing a wrapper for this library.
I'm trying to properly implement flow control and context handling and trying to test as much as possible, simulating connection loss and more.
For one of my tests I have a rabbitmq which is out of memory upon startup which triggers the connection blocking state.
This state seems to trigger some weird deadlocks or something along the lines in this library making this select statement block "forever":
Lines 181 to 205 in a2fcd5b
I have seen Channel.Close()
and Channel.UnbindQueue(...)
block "forever".
The blocking of Channel.UnbindQueue(...)
is reproduced in the test below.
Might be related to #225 (it might be possible to reproduce "turn off the internet" with the tool that I use for my tests that's called toxiproxy)
Reproduction steps
Here is a test that reproduces the problem:
- have docker & docker compose:
make environment
- execute test:
https://github.com/jxsl13/amqpx/blob/feat/the-context-update/pool/session_test.go#L732-L885https://github.com/jxsl13/amqpx/blob/main/pool/session_test.go#L682-L837
level=info, msg=creating connection,
level=info, msg=registering flow control notification channel,
level=info, msg=creating channel,
level=info, msg=registering error notification channel,
level=info, msg=registering confirms notification channel,
level=info, msg=registering flow control notification channel,
level=info, msg=registering returned message notification channel,
level=info, msg=declaring exchange,
level=info, msg=declaring queue,
level=info, msg=binding queue,
level=info, msg=publishing message,
level=info, msg=unbinding queue, (blocks here forever)
Expected behavior
QueueBind worked, so I guess QueueUnbind should also work.
I think this behavior can be triggered for nearly every method of Channel
.
Additional context
Should not be relevant but could:
darwin/arm64
macOS 14.3.1
Thanks for the report and the steps to reproduce this issue. I can reproduce it. As you noted, it requires a blocked RabbitMQ to reproduce.
I'm pretty sure I got the something similaron QueueBind
. Calling to QueueBind
with noWait=false
and getting stuck forever. Would it make sense to add some timeout for this operation ?