don't send ERROR:crowded in response to CLOSE
warner opened this issue · 4 comments
In magic-wormhole/magic-wormhole#408 I think a client received ERROR(crowded)
(eventually, because of #19, so only after the server was rebooted and the client reconnected, and re-sent the OPEN), and then tried to release the mailbox by sending CLOSE, but was thwarted because the server sent back another ERROR(crowded)
. I think the client then got stuck, unable to exit (because it thinks it still has a hold on the mailbox), not sending a new CLOSE (because it already sent one and it's just waiting for CLOSED).
The fix is probably to have handle_close()
(in the clause that does self._app.open_mailbox()
, because this connection didn't have one already) react to CrowdedError
by doing most of the rest of the function, instead of raising Error("crowded")
. We can't do anything that touches self._mailbox
, since we don't have one, but we can set self._did_close
and (most importantly) do self.send("closed")
. That should satisfy the unsettled ghost client and allow it to finally exit.
not sending a new CLOSE (because it already sent one and it's just waiting for CLOSED)
IMO that part is a bug on the client side: it sent a message and got an error
response, thus it should not expect that message to have passed.
The relevant server side code appears to be this:
if not self._mailbox:
try:
self._mailbox = self._app.open_mailbox(mailbox_id, self._side,
server_rx)
except CrowdedError:
raise Error("crowded")
I don't understand: why are we opening a new mailbox in the first place? If there's no mailbox on close, simply ignore it as already closed?
This seems like it could be related to what Im seeing right now on the public wormhole service.
I recent was reusing the same wormhole code
during some testing. After about 12+ sends using the same code I got a waiting for confirmation
and then after ctrl-c since it seemed to hang I got subsequent crowded
errors. These go away if I just get rid of the reused code.
The following behavior is what Im see this morning:
$ wormhole send --code 2-businessman-pheasant openwrt-ath79-generic-netgear_wndr3800ch-squashfs-factory.img
Sending 12.0 MB file named 'openwrt-ath79-generic-netgear_wndr3800ch-squashfs-factory.img'
Wormhole code is: 2-businessman-pheasant
On the other computer, please run:
wormhole receive 2-businessman-pheasant
Key established, waiting for confirmation...
Then the following crowded error:
$ wormhole send --code 2-businessman-pheasant openwrt-ath79-generic-netgear_wndr3800ch-squashfs-factory.img
Sending 12.0 MB file named 'openwrt-ath79-generic-netgear_wndr3800ch-squashfs-factory.img'
Wormhole code is: 2-businessman-pheasant
On the other computer, please run:
wormhole receive 2-businessman-pheasant
Traceback (most recent call last):
--- <exception caught here> ---
File "/nix/store/g9bwgzbj86sjyszyanzpxqjc0db3kbbp-python3.9-magic-wormhole-0.12.0/lib/python3.9/site-packages/wormhole/cli/cli.py", line 122, in _dispatch_command
yield maybeDeferred(command)
File "/nix/store/g9bwgzbj86sjyszyanzpxqjc0db3kbbp-python3.9-magic-wormhole-0.12.0/lib/python3.9/site-packages/wormhole/cli/cmd_send.py", line 93, in go
yield d
File "/nix/store/g9bwgzbj86sjyszyanzpxqjc0db3kbbp-python3.9-magic-wormhole-0.12.0/lib/python3.9/site-packages/wormhole/cli/cmd_send.py", line 139, in _go
yield w.get_unverified_key()
wormhole.errors.ServerError: crowded
ERROR: crowded
Again I can just use a new code but figured it was worth reporting since its fresh. Happy to open up another issue if thats more appropriate.
I can confirm that reusing codes multiple times causes spurious "crowded" issues which could be related to this issue. I have no specific reproduction recipe, but I strongly suspect there is a race hazard involved.
@sarcasticadmin please make sure that you know the security consequences of using a code multiple times and don't use this in production without counter-measures.
@piegamesde thanks for the clarification
@sarcasticadmin please make sure that you know the security consequences of using a code multiple times and don't use this in production without counter-measures.
Yep, absolutely. It was just for testing and pure laziness so I didnt have to keep updating inputs.