emqx/emqx-bridge-mqtt

MQTT Bridge drops/reconnects continuously w/ multiple replicas

endianjoakim opened this issue ยท 10 comments

Hi,

I have a setup where I need to MQTT bridge EMQx servers in different kubernetes clusters.

Cluster A runs EMQx v4.0.6. 2 Replicas.
Cluster B runs EMQx v3.2.2. 3 Replicas.

I'm setting up the MQTT bridge from Cluster A -> Cluster B.

What I'm seeing is that the replicas in Cluster A seem to fight for the bridge connection, by each of them trying to connect using the same MQTT clientid. When one replica connects, the other replica drops the connection, and vice versa. The effect is even more pronounced when I set 3 replicas on Cluster A. If I use a single replica on Cluster A, I do not se this issue.

Since the time between reconnects are 30 seconds (reconnect timeout) when I have 2 replicas, and either 10 seconds or 20 seconds when I use 3 replicas, I can conclude that they replicas do fight for the connection and disconnects each other. This is a classic example of multiple MQTT clients using the same clientid. See logs below.

-- Cluster A Logs -- 

##################
### 2 replicas ###
##################

(-@emqx-1)1> 2020-08-05 12:06:11.886 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:06:15.857 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send

(-@emqx-1)1> 2020-08-05 12:06:41.917 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:06:41.957 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:07:11.988 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:07:12.027 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:07:15.856 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send

(-@emqx-0)1> 2020-08-05 12:07:42.104 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-1)1> 2020-08-05 12:07:42.062 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected

(-@emqx-0)1> 2020-08-05 12:08:12.133 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:08:12.175 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:08:15.887 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send

(-@emqx-1)1> 2020-08-05 12:08:42.206 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:08:42.247 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:09:12.277 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:09:12.319 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:09:15.895 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send

(-@emqx-1)1> 2020-08-05 12:09:42.350 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:09:42.392 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:10:12.421 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:10:12.464 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

##################
### 3 replicas ###
##################

(-@emqx-2)1> 2020-08-05 12:48:41.075 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-2)1> 2020-08-05 12:49:01.235 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:49:01.275 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:49:08.956 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:49:08.999 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-1)1> 2020-08-05 12:49:31.306 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:49:31.348 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-2)1> 2020-08-05 12:49:39.028 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:49:39.070 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:50:01.378 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:50:01.419 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-1)1> 2020-08-05 12:50:09.099 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:50:09.141 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-2)1> 2020-08-05 12:50:31.451 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:50:31.494 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:50:39.172 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:50:39.211 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-1)1> 2020-08-05 12:51:01.523 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:51:01.565 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-2)1> 2020-08-05 12:51:09.245 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:51:09.284 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

(-@emqx-0)1> 2020-08-05 12:51:31.595 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:51:31.634 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......

My bridge config on Cluster A.

##====================================================================
## Configuration for EMQ X MQTT Broker Bridge
##====================================================================
bridge.mqtt.voi.address = xx.xx.xx.xx:1883
bridge.mqtt.voi.proto_ver = mqttv4
bridge.mqtt.voi.start_type = auto
bridge.mqtt.voi.bridge_mode = true
bridge.mqtt.voi.clientid = mqtt_bridge_client
bridge.mqtt.voi.clean_start = true
bridge.mqtt.voi.username = guest
bridge.mqtt.voi.password = removed
bridge.mqtt.voi.forwards = t1/#,t2/#
bridge.mqtt.voi.subscription.1.topic = r1/#
bridge.mqtt.voi.subscription.1.qos = 1
#bridge.mqtt.voi.forward_mountpoint = ""
#bridge.mqtt.voi.receive_mountpoint = ""
bridge.mqtt.voi.ssl = off
#bridge.mqtt.voi.cacertfile = etc/certs/cacert.pem
#bridge.mqtt.voi.certfile = etc/certs/client-cert.pem
#bridge.mqtt.voi.keyfile = etc/certs/client-key.pem
#bridge.mqtt.voi.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384,ECDHE-ECDSA-AES256-SHA384,ECDHE-RSA-AES256-SHA384,ECDHE-ECDSA-DES-CBC3-SHA,ECDH-ECDSA-AES256-GCM-SHA384,ECDH-RSA-AES256-GCM-SHA384,ECDH-ECDSA-AES256-SHA384,ECDH-RSA-AES256-SHA384,DHE-DSS-AES256-GCM-SHA384,DHE-DSS-AES256-SHA256,AES256-GCM-SHA384,AES256-SHA256,ECDHE-ECDSA-AES128-GCM-SHA256,ECDHE-RSA-AES128-GCM-SHA256,ECDHE-ECDSA-AES128-SHA256,ECDHE-RSA-AES128-SHA256,ECDH-ECDSA-AES128-GCM-SHA256,ECDH-RSA-AES128-GCM-SHA256,ECDH-ECDSA-AES128-SHA256,ECDH-RSA-AES128-SHA256,DHE-DSS-AES128-GCM-SHA256,DHE-DSS-AES128-SHA256,AES128-GCM-SHA256,AES128-SHA256,ECDHE-ECDSA-AES256-SHA,ECDHE-RSA-AES256-SHA,DHE-DSS-AES256-SHA,ECDH-ECDSA-AES256-SHA,ECDH-RSA-AES256-SHA,AES256-SHA,ECDHE-ECDSA-AES128-SHA,ECDHE-RSA-AES128-SHA,DHE-DSS-AES128-SHA,ECDH-ECDSA-AES128-SHA,ECDH-RSA-AES128-SHA,AES128-SHA
#bridge.mqtt.voi.psk_ciphers = PSK-AES128-CBC-SHA,PSK-AES256-CBC-SHA,PSK-3DES-EDE-CBC-SHA,PSK-RC4-SHA
bridge.mqtt.voi.keepalive = 60s
bridge.mqtt.voi.tls_versions = tlsv1.2,tlsv1.1,tlsv1
bridge.mqtt.voi.reconnect_interval = 30s
bridge.mqtt.voi.retry_interval = 20s
bridge.mqtt.voi.batch_size = 32
bridge.mqtt.voi.max_inflight_size = 32
bridge.mqtt.voi.queue.replayq_dir = data/emqx_bridge/
bridge.mqtt.voi.queue.replayq_seg_bytes = 10MB
bridge.mqtt.voi.queue.max_total_size = 5GB

This is unexpected. I need to have multiple replicas in Cluster A, and only a single bridge connection to Cluster B.

  1. Any ideas on how to accomplish this?
  2. Could the EMQx server versions mismatch cause this?
  3. If the replicas get unique MQTT clientids, e.g mqtt_bridge_client_1, mqtt_bridge_client_2, mqtt_bridge_client_3, but they publish and subscribe to the same topics, would that cause duplicated MQTT messages being sent and received?
  4. Is this a configuration issue?

Thanks

Hi, @endianjoakim Your suspicions are correct. In the current implementation, there are two options for bridging data between clusters:

  1. Only one node in cluster A should be selected to bridge messages across the cluster, while the other nodes should disable the bridging plug-in

  2. Along the lines of your speculation in 3, but you need to use a shared subscription to avoid receiving duplicate messages

Hi @HJianBo, thanks for the fast reply.

Ok, for option 2.

  • Can I script the clientid in the bridge config? e.g bridge.mqtt.voi.clientid = mqtt_bridge_client_${node}?
  • How do I create shared subscriptions? Can I do that in the bridge config?

Thanks

  1. Of course.
  2. See:https://docs.emqx.io/broker/latest/en/advanced/shared-subscriptions.html

E.g:

bridge.mqtt.voi.forwards = $share/grpname/t1/#,$share/grpname/t2/#
bridge.mqtt.voi.subscription.1.topic = $share/grpname/r1/#

Excellent, I'll try it out.

Thank you very much!

  • Can I script the clientid in the bridge config? e.g bridge.mqtt.voi.clientid = mqtt_bridge_client_${node}?
  • Of course.

This does not seem to work.

With bridge.mqtt.voi.clientid = mqtt_bridge_client_${node} I get the same disconnection issue using 2 replicas.

In the logs, it seems ${node} was not parsed, but used literally. Notice the node name on the left in the log, and the clientid in the middle.

(-@emqx-0)1> 2020-08-06 08:20:15.357 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-0)1> 2020-08-06 08:20:15.357 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-0)1> 2020-08-06 08:20:15.397 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>
(-@emqx-0)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): tcp_closed
(-@emqx-1)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-1)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-1)1> 2020-08-06 08:20:45.472 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>
(-@emqx-1)1> 2020-08-06 08:21:07.248 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<50,59,0,17,99,47,57,48,49,48,49,48,48,
(-@emqx-1)1> 2020-08-06 08:21:07.248 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,4,false,0,false},{mqtt_packet_puback,1,0,undefined},undefined}
(-@emqx-0)1> 2020-08-06 08:21:15.464 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,1,false,0,false},{mqtt_packet_connect,<<"MQTT">>,4,true,true,false,0,false,60,#{},<<"mqtt_bridge_client_${node}">>,undefined,undefined,undefined,<<"guest">>,<<"removed">>},undefined}
(-@emqx-0)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-0)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-1)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): tcp_closed
(-@emqx-0)1> 2020-08-06 08:21:15.541 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>

Could this be a bug?

Thanks

emqtt(mqtt_bridge_client_${node})

You should use the really node name to replace ${node}

Ok, I misunderstood. You mean the config parser will not replace the ${node} variable with the node name, e.g emq-1 automatically? I have to do it manually?

Since I use this with kubernetes I would prefer to use the same config for all replicas, for simpler scaling. Would it be possible to add the ${node} variable parsing to the clientid config?

The parser seems to replace the ${node} variable in other places in the bridge config. E.g in the forward_mountpoint, see https://github.com/emqx/emqx-bridge-mqtt/blob/master/etc/emqx_bridge_mqtt.conf#L71

Thanks

Thanks for your suggestion.

We release it next version

Thanks for the prompt reply! ๐Ÿ˜„

Verified to work in v4.2. This can be closed. ๐Ÿ˜„ ๐Ÿ‘