VEuPathDB/vdi-service

RabbitMQ connection retries halted the service startup (and may have crashed the server)

Closed this issue · 3 comments

ERROR com.rabbitmq.client.impl.ForgivingExceptionHandler - Caught an exception when recovering topology Caught an exception while recovering exchange qa-vdi-bucket-notifications: connection is already closed due to connection error; cause: com.rabbitmq.client.MissedHeartbeat
Exception: Detected missed server heartbeats, heartbeat interval: 60 seconds, RabbitMQ node hostname: 172.16.44.201
 com.rabbitmq.client.TopologyRecoveryException: Caught an exception while recovering exchange qa-vdi-bucket-notifications: connection is already closed due to connection error; cause: com.rabbitmq.client.MissedHeartbeatException: Detected missed server heartbeats, heartbeat interval: 60 seconds, RabbitMQ node hostnam
e: 172.16.44.201
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverExchange(AutorecoveringConnection.java:770) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverTopology(AutorecoveringConnection.java:723) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.beginAutomaticRecovery(AutorecoveringConnection.java:602) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.lambda$addAutomaticRecoveryListener$3(AutorecoveringConnection.java:524) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQConnection.notifyRecoveryCanBeginListeners(AMQConnection.java:839) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQConnection.doFinalShutdown(AMQConnection.java:816) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQConnection.handleHeartbeatFailure(AMQConnection.java:781) ~[service.jar:?]
        at com.rabbitmq.client.impl.nio.NioLoop.lambda$handleHeartbeatFailure$0(NioLoop.java:281) ~[service.jar:?]
        at java.lang.Thread.run(Thread.java:1589) [?:?]
Caused by: com.rabbitmq.client.AlreadyClosedException: connection is already closed due to connection error; cause: com.rabbitmq.client.MissedHeartbeatException: Detected missed server heartbeats, heartbeat interval: 60 seconds, RabbitMQ node hostname: 172.16.44.201
        at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:281) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQChannel.rpc(AMQChannel.java:365) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQChannel.privateRpc(AMQChannel.java:305) ~[service.jar:?]
        at com.rabbitmq.client.impl.AMQChannel.exnWrappingRpc(AMQChannel.java:152) ~[service.jar:?]
        at com.rabbitmq.client.impl.ChannelN.exchangeDeclare(ChannelN.java:804) ~[service.jar:?]
        at com.rabbitmq.client.impl.ChannelN.exchangeDeclare(ChannelN.java:746) ~[service.jar:?]
        at com.rabbitmq.client.impl.ChannelN.exchangeDeclare(ChannelN.java:47) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.RecordedExchange.recover(RecordedExchange.java:36) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.lambda$recoverExchange$12(AutorecoveringConnection.java:759) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.wrapRetryIfNecessary(AutorecoveringConnection.java:914) ~[service.jar:?]
        at com.rabbitmq.client.impl.recovery.AutorecoveringConnection.recoverExchange(AutorecoveringConnection.java:758) ~[service.jar:?]
        ... 8 more

It seems the error was caught and logged by the RabbitMQ client itself. It seems like the service went down because there is nothing else logged after this error, but whether the server was online or not was not confirmed.

Possibly relates to #13

This particular situation happened because RabbitMQ went down after VDI had established it's initial connection but somehow didn't send a shutdown signal?

To test this locally we would likely need to create a custom script that accepts a TCP socket connection on port 5672 and then immediately crashes without sending any data.