aweber/rabbitmq-autocluster

See lots of 'Error updating Consul health check: "500"' messages in the logs

cap10morgan opened this issue · 8 comments

I'm using the 0.6.1 release w/ Consul 0.7.1 and RabbitMQ 3.6.5. The cluster starts up OK, but then I see lots of messages like this in the RabbitMQ logs:

=ERROR REPORT==== 30-Nov-2016::17:34:10 ===
autocluster: Error updating Consul health check: "500"

I'm not sure how concerned I should be by that, things generally seem to work. But it doesn't fill me with confidence either.

gmr commented

My guess is you're getting a 500 error back from consul. I'd check the consul logs to see what it has to say.

Errors from Consul log look like this:

2016/11/30 17:36:55 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq, error: CheckID "service:rabbitmq" does not have associated TTL from=172.17.0.2:34919

gmr commented

Sorry for the delay, what version of Consul?

0.7.1

I could reproduce the same bug by stopping the consul node while the containers are running:

   2017/03/01 18:05:28 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL from=172.20.0.3:49147
    2017/03/01 18:05:28 [DEBUG] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76 (247.449µs) from=172.20.0.3:49147
    2017/03/01 18:05:31 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL from=172.20.0.2:52632
    2017/03/01 18:05:31 [DEBUG] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76 (338.155µs) from=172.20.0.2:52632
    2017/03/01 18:05:43 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL from=172.20.0.3:51063
    2017/03/01 18:05:43 [DEBUG] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76 (75.139µs) from=172.20.0.3:51063
    2017/03/01 18:05:46 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL from=172.20.0.2:36984
    2017/03/01 18:05:46 [DEBUG] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76 (225.301µs) from=172.20.0.2:36984
    2017/03/01 18:05:58 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL from=172.20.0.3:39587
    2017/03/01 18:05:58 [DEBUG] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76 (105.067µs) from=172.20.0.3:39587
    2017/03/01 18:06:01 [ERR] http: Request GET /v1/agent/check/pass/service%3Arabbitmq-test1%3A192.168.1.76, error: CheckID "service:rabbitmq-test1:192.168.1.76" does not have associated TTL

and RabbitMQ side:

rabbit_node_1_1  | =ERROR REPORT==== 1-Mar-2017::17:06:01 ===
rabbit_node_1_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_2_1  |
rabbit_node_2_1  | =ERROR REPORT==== 1-Mar-2017::17:06:13 ===
rabbit_node_2_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_1_1  |
rabbit_node_1_1  | =ERROR REPORT==== 1-Mar-2017::17:06:16 ===
rabbit_node_1_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_2_1  |
rabbit_node_2_1  | =ERROR REPORT==== 1-Mar-2017::17:06:28 ===
rabbit_node_2_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_1_1  |
rabbit_node_1_1  | =ERROR REPORT==== 1-Mar-2017::17:06:31 ===
rabbit_node_1_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_2_1  |
rabbit_node_2_1  | =ERROR REPORT==== 1-Mar-2017::17:06:43 ===
rabbit_node_2_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_1_1  |
rabbit_node_1_1  | =ERROR REPORT==== 1-Mar-2017::17:06:46 ===
rabbit_node_1_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_2_1  |
rabbit_node_2_1  | =ERROR REPORT==== 1-Mar-2017::17:06:58 ===
rabbit_node_2_1  | autocluster: Error updating Consul health check: "500"
rabbit_node_1_1  |
rabbit_node_1_1  | =ERROR REPORT==== 1-Mar-2017::17:07:01 ===
rabbit_node_1_1  | autocluster: Error updating Consul health check: "500"

could be the same situation @cap10morgan ?

@Gsantomaggio Not in my case, no. Consul is running the entire time.

Addressed in team RabbitMQ's fork: rabbitmq#2.

gmr commented

This plugin was forked by the RabbitMQ team and is now part of RabbitMQ. More information can be found @ https://github.com/rabbitmq/rabbitmq-autocluster