1 week pass on Discord bot to identify tasks/plan forward
misslivirose opened this issue · 3 comments
The discord bot has a tendency to self-destruct every week or two. It either just becomes unresponsive, starts entering and exiting rooms repeatedly, and/or just takes up a lot of CPU load, spinning on something.
We'll need to do some investigation to figure out what's going wrong and what we can do to solve it. It's possible that there are multiple causes to these problems. Here are some possibilities:
- The bot does not handle closed rooms correctly. The bot seems to spin forever trying to reconnect to rooms that it is still bridged to, but have since been closed. We ought to solve this at the very least because it fills the logs with errors that make it hard to diagnose other potential issues.
- The discord bot is not resilient to disruptions in the discord API. If discord has some downtime, or latency issues, the bot might end up disconnecting and fail to reconnect.
- The discord bot is generally running into scaling issues. It's possible we've hit some sort of limit and need to explore scaling strategies to connect to the many guilds, channels and rooms that the bot needs to handle. This might involve sharding the bot into multiple processes, or multiple server instances. One way this manifests is that it takes about 15 minutes from when the bot is started, to when it has finally connected to all of the channels and rooms it needs to serve.
See also: #102
See also: #107
The issues with closed rooms, and a partial solution to scaling were addressed in #117 and Hubs-Foundation/reticulum#493
I don't think we ever confirmed an issue specific to discord API disruptions, but there are definitely still issues with stability, which I will open a new issue for.