aweber/rabbitmq-autocluster

auto-clustering not working when container dies in AWS ECS

malawson opened this issue · 2 comments

Hello,
I am trying to use the rabbitmq-autocluster plugin in AWS ECS. Per instructions on here (https://github.com/aweber/rabbitmq-autocluster/wiki/AWS%20Configuration) I am able to have my rabbit instances auto-cluster on boot - see cluster status below - using the AWS backend (specifically the AWS autoscaling group membership).

bash-4.3# rabbitmqctl cluster_status
Cluster status of node 'rabbit@ip-10-200-22-148' ...
[{nodes,[{disc,['rabbit@ip-10-200-13-174','rabbit@ip-10-200-2-39',
'rabbit@ip-10-200-22-148']}]},
{running_nodes,['rabbit@ip-10-200-13-174','rabbit@ip-10-200-2-39',
'rabbit@ip-10-200-22-148']},
{cluster_name,<<"rabbit@ip-10-200-2-39.ec2.internal">>},
{partitions,[]},
{alarms,[{'rabbit@ip-10-200-13-174',[]},
{'rabbit@ip-10-200-2-39',[]},
{'rabbit@ip-10-200-22-148',[]}]}]
bash-4.3#

Auto-clustering works also when i terminated the EC2 instance and a new one is restarted - since the rabbit nodelist is updated with the newly started instance (with a new instance-id) that joins the autoscaling group.

Yet whenever one of the rabbit containers dies in in the cluster the cluster never gets reformed. For example the node rabbit@ip-10-200-13-174 died and when it came back online it wasn’t able to rejoin the existing cluster

bash-4.3# rabbitmqctl cluster_status
Cluster status of node 'rabbit@ip-10-200-13-174' ...
[{nodes,[{disc,['rabbit@ip-10-200-13-174']}]},
{running_nodes,['rabbit@ip-10-200-13-174']},
{cluster_name,<<"rabbit@ip-10-200-13-174.ec2.internal">>},
{partitions,[]},
{alarms,[{'rabbit@ip-10-200-13-174',[]}]}]
bash-4.3# cat /var/lib/rabbitmq/mnesia/cluster_nodes.config
{['rabbit@ip-10-200-13-174'],['rabbit@ip-10-200-13-174']}.
bash-4.3# cat /var/lib/rabbitmq/mnesia/nodes_running_at_shutdown
['rabbit@ip-10-200-13-174'].
bash-4.3#

and now here is the state of the existing cluster

bash-4.3# rabbitmqctl cluster_status
Cluster status of node 'rabbit@ip-10-200-22-148' ...
[{nodes,[{disc,['rabbit@ip-10-200-13-174','rabbit@ip-10-200-2-39',
'rabbit@ip-10-200-22-148']}]},
{running_nodes,['rabbit@ip-10-200-2-39','rabbit@ip-10-200-22-148']},
{cluster_name,<<"rabbit@ip-10-200-2-39.ec2.internal">>},
{partitions,[]},
{alarms,[{'rabbit@ip-10-200-2-39',[]},{'rabbit@ip-10-200-22-148',[]}]}]
bash-4.3# cat /var/lib/rabbitmq/mnesia/cluster_nodes.config
{['rabbit@ip-10-200-13-174','rabbit@ip-10-200-2-39','rabbit@ip-10-200-22-148'],['rabbit@ip-10-200-13-174','rabbit@ip-10-200-2-39','rabbit@ip-10-200-22-148']}.
bash-4.3# cat /var/lib/rabbitmq/mnesia/nodes_running_at_shutdown
['rabbit@ip-10-200-2-39','rabbit@ip-10-200-22-148'].
bash-4.3#

so after seeing this i thought that this must be because i wasn’t persisting the data in /var/lib/rabbitmq/ on the docker host so that when a container gets replaced rabbit can read the cluster config on disk and rejoin the existing cluster.

I then tried sharing the /var/lib/rabbitmq/ directory via docker volumes onto the docker host. When i do that the auto clustering never works, and each rabbit instance is in it’s stand-alone cluster like here:

bash-4.3# rabbitmqctl cluster_status
Cluster status of node 'rabbit@ip-10-200-13-174' ...
[{nodes,[{disc,['rabbit@ip-10-200-13-174']}]},
{running_nodes,['rabbit@ip-10-200-13-174']},
{cluster_name,<<"rabbit@ip-10-200-13-174.ec2.internal">>},
{partitions,[]},
{alarms,[{'rabbit@ip-10-200-13-174',[]}]}]
bash-4.3#

Any idea why auto-clustering doesn't work when i setup docker volumes - or anything else that i could be doing wrong?

@malawson without the logs it is difficult to know what is going on there, but it seems that you're starting a fresh node every time. The new node is in its own cluster, which means it didn't have the mnesia database to read. You need to ensure the docker container has access to the old database.

gmr commented

This plugin was forked by the RabbitMQ team and is now part of RabbitMQ. More information can be found @ https://github.com/rabbitmq/rabbitmq-autocluster