consensus-shipyard/mir

Testing sometimes hangs

sergefdrv opened this issue · 3 comments

Sometimes make test hangs, e.g. here.

I managed to reproduce it with (normally it takes around 30s for testing on my machine):

while GOFLAGS="-timeout=1m -count=1" make test; do :; done

Hm looks like there is some non-deterministic issue when shutting down the gRPC transport after the actual test finishes. I saw it before and I thought I had fixed it, but apparently I didn't.

I think I got it.
The transport layer is writing messages that it receives over the network to a channel, from which the node implementation reads them. When the node is shutting down, it stops reading incoming messages from this channel. However, if there are still incoming messages on the wire at that time, the transport layer tries to write them to the channel and blocks (since nobody reads any more), preventing a clean shutdown of the transport layer. Should not be hard to fix using a Context for the transport layer.