MQTT Bridge drops/reconnects continuously w/ multiple replicas
endianjoakim opened this issue ยท 10 comments
Hi,
I have a setup where I need to MQTT bridge EMQx servers in different kubernetes clusters.
Cluster A runs EMQx v4.0.6. 2 Replicas.
Cluster B runs EMQx v3.2.2. 3 Replicas.
I'm setting up the MQTT bridge from Cluster A -> Cluster B.
What I'm seeing is that the replicas in Cluster A seem to fight for the bridge connection, by each of them trying to connect using the same MQTT clientid
. When one replica connects, the other replica drops the connection, and vice versa. The effect is even more pronounced when I set 3 replicas on Cluster A. If I use a single replica on Cluster A, I do not se this issue.
Since the time between reconnects are 30 seconds (reconnect timeout) when I have 2 replicas, and either 10 seconds or 20 seconds when I use 3 replicas, I can conclude that they replicas do fight for the connection and disconnects each other. This is a classic example of multiple MQTT clients using the same clientid. See logs below.
-- Cluster A Logs --
##################
### 2 replicas ###
##################
(-@emqx-1)1> 2020-08-05 12:06:11.886 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:06:15.857 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send
(-@emqx-1)1> 2020-08-05 12:06:41.917 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:06:41.957 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:07:11.988 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:07:12.027 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:07:15.856 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send
(-@emqx-0)1> 2020-08-05 12:07:42.104 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-1)1> 2020-08-05 12:07:42.062 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:08:12.133 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:08:12.175 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:08:15.887 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send
(-@emqx-1)1> 2020-08-05 12:08:42.206 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:08:42.247 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:09:12.277 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:09:12.319 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:09:15.895 [notice] [Bridge] Bridge emqx_bridge_worker_voi discarded internal type event at state idle:maybe_send
(-@emqx-1)1> 2020-08-05 12:09:42.350 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:09:42.392 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:10:12.421 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:10:12.464 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
##################
### 3 replicas ###
##################
(-@emqx-2)1> 2020-08-05 12:48:41.075 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-2)1> 2020-08-05 12:49:01.235 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:49:01.275 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:49:08.956 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:49:08.999 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-1)1> 2020-08-05 12:49:31.306 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:49:31.348 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-2)1> 2020-08-05 12:49:39.028 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:49:39.070 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:50:01.378 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:50:01.419 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-1)1> 2020-08-05 12:50:09.099 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:50:09.141 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-2)1> 2020-08-05 12:50:31.451 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:50:31.494 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:50:39.172 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:50:39.211 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-1)1> 2020-08-05 12:51:01.523 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-2)1> 2020-08-05 12:51:01.565 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-2)1> 2020-08-05 12:51:09.245 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-0)1> 2020-08-05 12:51:09.284 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
(-@emqx-0)1> 2020-08-05 12:51:31.595 [info] [Bridge] Bridge emqx_bridge_worker_voi diconnected
(-@emqx-1)1> 2020-08-05 12:51:31.634 [info] [Bridge] Bridge emqx_bridge_worker_voi is connecting......
My bridge config on Cluster A.
##====================================================================
## Configuration for EMQ X MQTT Broker Bridge
##====================================================================
bridge.mqtt.voi.address = xx.xx.xx.xx:1883
bridge.mqtt.voi.proto_ver = mqttv4
bridge.mqtt.voi.start_type = auto
bridge.mqtt.voi.bridge_mode = true
bridge.mqtt.voi.clientid = mqtt_bridge_client
bridge.mqtt.voi.clean_start = true
bridge.mqtt.voi.username = guest
bridge.mqtt.voi.password = removed
bridge.mqtt.voi.forwards = t1/#,t2/#
bridge.mqtt.voi.subscription.1.topic = r1/#
bridge.mqtt.voi.subscription.1.qos = 1
#bridge.mqtt.voi.forward_mountpoint = ""
#bridge.mqtt.voi.receive_mountpoint = ""
bridge.mqtt.voi.ssl = off
#bridge.mqtt.voi.cacertfile = etc/certs/cacert.pem
#bridge.mqtt.voi.certfile = etc/certs/client-cert.pem
#bridge.mqtt.voi.keyfile = etc/certs/client-key.pem
#bridge.mqtt.voi.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384,ECDHE-ECDSA-AES256-SHA384,ECDHE-RSA-AES256-SHA384,ECDHE-ECDSA-DES-CBC3-SHA,ECDH-ECDSA-AES256-GCM-SHA384,ECDH-RSA-AES256-GCM-SHA384,ECDH-ECDSA-AES256-SHA384,ECDH-RSA-AES256-SHA384,DHE-DSS-AES256-GCM-SHA384,DHE-DSS-AES256-SHA256,AES256-GCM-SHA384,AES256-SHA256,ECDHE-ECDSA-AES128-GCM-SHA256,ECDHE-RSA-AES128-GCM-SHA256,ECDHE-ECDSA-AES128-SHA256,ECDHE-RSA-AES128-SHA256,ECDH-ECDSA-AES128-GCM-SHA256,ECDH-RSA-AES128-GCM-SHA256,ECDH-ECDSA-AES128-SHA256,ECDH-RSA-AES128-SHA256,DHE-DSS-AES128-GCM-SHA256,DHE-DSS-AES128-SHA256,AES128-GCM-SHA256,AES128-SHA256,ECDHE-ECDSA-AES256-SHA,ECDHE-RSA-AES256-SHA,DHE-DSS-AES256-SHA,ECDH-ECDSA-AES256-SHA,ECDH-RSA-AES256-SHA,AES256-SHA,ECDHE-ECDSA-AES128-SHA,ECDHE-RSA-AES128-SHA,DHE-DSS-AES128-SHA,ECDH-ECDSA-AES128-SHA,ECDH-RSA-AES128-SHA,AES128-SHA
#bridge.mqtt.voi.psk_ciphers = PSK-AES128-CBC-SHA,PSK-AES256-CBC-SHA,PSK-3DES-EDE-CBC-SHA,PSK-RC4-SHA
bridge.mqtt.voi.keepalive = 60s
bridge.mqtt.voi.tls_versions = tlsv1.2,tlsv1.1,tlsv1
bridge.mqtt.voi.reconnect_interval = 30s
bridge.mqtt.voi.retry_interval = 20s
bridge.mqtt.voi.batch_size = 32
bridge.mqtt.voi.max_inflight_size = 32
bridge.mqtt.voi.queue.replayq_dir = data/emqx_bridge/
bridge.mqtt.voi.queue.replayq_seg_bytes = 10MB
bridge.mqtt.voi.queue.max_total_size = 5GB
This is unexpected. I need to have multiple replicas in Cluster A, and only a single bridge connection to Cluster B.
- Any ideas on how to accomplish this?
- Could the EMQx server versions mismatch cause this?
- If the replicas get unique MQTT
clientids
, e.gmqtt_bridge_client_1
,mqtt_bridge_client_2
,mqtt_bridge_client_3
, but they publish and subscribe to the same topics, would that cause duplicated MQTT messages being sent and received? - Is this a configuration issue?
Thanks
Hi, @endianjoakim Your suspicions are correct. In the current implementation, there are two options for bridging data between clusters:
-
Only one node in cluster A should be selected to bridge messages across the cluster, while the other nodes should disable the bridging plug-in
-
Along the lines of your speculation in 3, but you need to use a shared subscription to avoid receiving duplicate messages
Hi @HJianBo, thanks for the fast reply.
Ok, for option 2.
- Can I script the
clientid
in the bridge config? e.gbridge.mqtt.voi.clientid = mqtt_bridge_client_${node}
? - How do I create shared subscriptions? Can I do that in the bridge config?
Thanks
E.g:
bridge.mqtt.voi.forwards = $share/grpname/t1/#,$share/grpname/t2/#
bridge.mqtt.voi.subscription.1.topic = $share/grpname/r1/#
Excellent, I'll try it out.
Thank you very much!
- Can I script the clientid in the bridge config? e.g bridge.mqtt.voi.clientid = mqtt_bridge_client_${node}?
- Of course.
This does not seem to work.
With bridge.mqtt.voi.clientid = mqtt_bridge_client_${node}
I get the same disconnection issue using 2 replicas.
In the logs, it seems ${node}
was not parsed, but used literally. Notice the node name on the left in the log, and the clientid in the middle.
(-@emqx-0)1> 2020-08-06 08:20:15.357 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-0)1> 2020-08-06 08:20:15.357 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-0)1> 2020-08-06 08:20:15.397 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>
(-@emqx-0)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): tcp_closed
(-@emqx-1)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-1)1> 2020-08-06 08:20:45.429 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-1)1> 2020-08-06 08:20:45.472 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>
(-@emqx-1)1> 2020-08-06 08:21:07.248 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<50,59,0,17,99,47,57,48,49,48,49,48,48,
(-@emqx-1)1> 2020-08-06 08:21:07.248 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,4,false,0,false},{mqtt_packet_puback,1,0,undefined},undefined}
(-@emqx-0)1> 2020-08-06 08:21:15.464 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,1,false,0,false},{mqtt_packet_connect,<<"MQTT">>,4,true,true,false,0,false,60,#{},<<"mqtt_bridge_client_${node}">>,undefined,undefined,undefined,<<"guest">>,<<"removed">>},undefined}
(-@emqx-0)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<32,2,0,0>>
(-@emqx-0)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): SEND Data: {mqtt_packet,{mqtt_packet_header,8,false,1,false},{mqtt_packet_subscribe,2,#{},[{<<"$share/voi_bridge/t1/#">>,#{nl => 0,qos => 1,rap => 0,rh => 0}}]},undefined}
(-@emqx-1)1> 2020-08-06 08:21:15.501 [debug] emqtt(mqtt_bridge_client_${node}): tcp_closed
(-@emqx-0)1> 2020-08-06 08:21:15.541 [debug] emqtt(mqtt_bridge_client_${node}): RECV Data: <<144,3,0,2,1>>
Could this be a bug?
Thanks
emqtt(mqtt_bridge_client_${node})
You should use the really node name to replace ${node}
Ok, I misunderstood. You mean the config parser will not replace the ${node}
variable with the node name, e.g emq-1
automatically? I have to do it manually?
Since I use this with kubernetes I would prefer to use the same config for all replicas, for simpler scaling. Would it be possible to add the ${node}
variable parsing to the clientid
config?
The parser seems to replace the ${node}
variable in other places in the bridge config. E.g in the forward_mountpoint
, see https://github.com/emqx/emqx-bridge-mqtt/blob/master/etc/emqx_bridge_mqtt.conf#L71
Thanks
Thanks for your suggestion.
We release it next version
Thanks for the prompt reply! ๐
Verified to work in v4.2. This can be closed. ๐ ๐