smrchy/rsmq

realtime queue stops working randomly

createthis opened this issue · 2 comments

We're using RSMQ 0.12.3 and I'm noticing that after about a day of server uptime my realtime queue just stops processing messages. What is the best way to reload the queue so that it starts processing messages again? Do something on a HUP signal, perhaps?

@createthis If you could share some details on how the issue can be reproduced?
Please provide some logs and details of the environment like OS, version of NodeJS etc.

Personally, I have RSMQ working fine in my production setups for months, without needing a restart. It is rock-solid as per my experience. Are you sure your Redis deployment is healthy? Are you monitoring Redis? Please inspect any stats/graphs from it to rule that out.

@solutionguy It took a lot of debugging, but I learned more about this problem and implemented a workaround. We use AWS RDS and have tcp-keepalive set to 300. Despite this, our stunnel encryption service was hitting TIMEOUTidle on the connection (defaults to 43200 seconds or 12 hours) and resetting it.

The workaround I implemented was a periodic Redis ping on the long running connections.

I don't understand why it was necessary, but the ping appears to have solved the problem.