Cannot Add new nodes in AWS AutoScaling Group

Question

Cannot Add new nodes in AWS AutoScaling Group

spember opened this issue 8 years ago · 7 comments

Not sure exactly what's happening, but I'm seeing failures of nodes to join in the cluster, about 50% of the time and there's nothing meaningful I can find in the logs.
Scenario:

launch AMI with rabbitmq w/autocluster in AutoScaling group. Instances may permissions to describe instances in autoscaling group. Instance comes up without a problem, launching a cluster of 1 node
increase autoscaling group to have a min of more than 1. Instances are launched. About half the time, the node will fail to start rabbitmq.
If failure, I can terminate the instance. A new one will appear and will connect just fine.

There appears to be no reason why some nodes will fail. e.g. no differences in availability zone.

The logs on failed nodes look like this:

=INFO REPORT==== 16-Sep-2016::14:58:35 ===
Starting RabbitMQ 3.6.5 on Erlang 19.0.3
Copyright (C) 2007-2016 Pivotal Software, Inc.
Licensed under the MPL.  See http://www.rabbitmq.com/

=INFO REPORT==== 16-Sep-2016::14:58:35 ===
node           : rabbit@<ipaddress>
home dir       : /var/lib/rabbitmq
config file(s) : /etc/rabbitmq/rabbitmq.config
cookie hash    : znKyrXfwLkBEHmOK3h9zxA==
log            : /var/log/rabbitmq/rabbit@<ipaddress>.log
sasl log       : /var/log/rabbitmq/rabbit@<ipaddress>-sasl.log
database dir   : /var/lib/rabbitmq/mnesia/rabbit@<ipaddress>

=INFO REPORT==== 16-Sep-2016::14:58:36 ===
autocluster: Delaying startup for 3270ms.

=INFO REPORT==== 16-Sep-2016::14:58:40 ===
autocluster: Starting aws registration.

=INFO REPORT==== 16-Sep-2016::14:58:40 ===
Error description:
   {could_not_start,rabbit,
       {function_clause,
           [{autocluster,maybe_register,
                [error,aws,autocluster_aws],
                [{file,"src/autocluster.erl"},{line,111}]},
            {autocluster,init,0,[{file,"src/autocluster.erl"},{line,33}]},
            {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
            {rabbit_boot_steps,run_step,2,
                [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
            {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                [{file,"src/rabbit_boot_steps.erl"},{line,26}]},
            {rabbit_boot_steps,run_boot_steps,1,
                [{file,"src/rabbit_boot_steps.erl"},{line,26}]},
            {rabbit,start,2,[{file,"src/rabbit.erl"},{line,583}]},
            {application_master,start_it_old,4,
                [{file,"application_master.erl"},{line,273}]}]}}

Log files (may contain more information):
<this points to the current file>

Is there a step I'm missing? More importantly, could we got some more meaningful information about the error in the logs?

Answer 1 · 2016-09-16T15:14:00.000Z

autocluster:maybe_register/3 failed but there's little detail about what's going on. Please use correct GitHub formatting, perhaps that swallows some of the log?

Answer 2 · 2016-09-16T15:33:43.000Z

Whoops. Is that better, @michaelklishin ? Note that this is the entire file

Answer 3 · 2016-09-16T15:36:32.000Z

That at least contains a line in autocluster.erl, thank you.

Answer 4 · 2016-09-27T17:26:30.000Z

I'm running into what appears to be this same problem. I also have a crash report with a little more information. In contrast to @spember, however, I haven't been able to get the autocluster plugin to work at all.

=CRASH REPORT==== 27-Sep-2016::10:23:59 ===
  crasher:
    initial call: application_master:init/4
    pid: <0.155.0>
    registered_name: []
    exception exit: {bad_return,
                        {{rabbit,start,[normal,[]]},
                         {'EXIT',
                             {function_clause,
                                 [{autocluster,maybe_register,
                                      [error,aws,autocluster_aws],
                                      [{file,"src/autocluster.erl"},
                                       {line,111}]},
                                  {autocluster,init,0,
                                      [{file,"src/autocluster.erl"},
                                       {line,33}]},
                                  {rabbit_boot_steps,
                                      '-run_step/2-lc$^1/1-1-',1,
                                      [{file,"src/rabbit_boot_steps.erl"},
                                       {line,49}]},
                                  {rabbit_boot_steps,run_step,2,
                                      [{file,"src/rabbit_boot_steps.erl"},
                                       {line,49}]},
                                  {rabbit_boot_steps,
                                      '-run_boot_steps/1-lc$^0/1-0-',1,
                                      [{file,"src/rabbit_boot_steps.erl"},
                                       {line,26}]},
                                  {rabbit_boot_steps,run_boot_steps,1,
                                      [{file,"src/rabbit_boot_steps.erl"},
                                       {line,26}]},
                                  {rabbit,start,2,
                                      [{file,"src/rabbit.erl"},{line,583}]},
                                  {application_master,start_it_old,4,
                                      [{file,"application_master.erl"},
                                       {line,273}]}]}}}}
      in function  application_master:init/4 (application_master.erl, line 134)
    ancestors: [<0.154.0>]
    messages: [{'EXIT',<0.156.0>,normal}]
    links: [<0.154.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 255
  neighbours:

Answer 5 · 2016-10-28T09:22:49.000Z

Looks very similar to the issue I've just raised #104

Answer 6 · 2016-11-13T15:03:30.000Z

#104 has a few comments that outline what seems to be going on. I'd close it in favour of that issue.

Answer 7 · 2018-02-14T22:12:54.000Z

This plugin was forked by the RabbitMQ team and is now part of RabbitMQ. More information can be found @ https://github.com/rabbitmq/rabbitmq-autocluster