strimzi/strimzi-mqtt-bridge

MQTT clients are not able to specify any Key

antonio-pedro99 opened this issue · 7 comments

We are just moving bytes from MQTT payload to Kafka payload, but it doesn't allow to the MQTT client to specify a key
Usually the MQTT payload is something more than just bytes, i.e. a JSON that could contains key and value as we have for the Strimzi HTTP bridge. The JSON could also bring more than one message/record.

This issue is just to keep track

Originally posted by @ppatierno in #12 (comment)

I agree that a JSON can bring more than one message/record, but I do not think it is ideal setting record keys inside the MQTT client message payload.

Let's assume that the IoT devices do not know anything about the Kafka Cluster, the only thing they must know is about MQTT. Therefore, the clients are not allowed to specify any key that would be used in the Kafka Record.

However, keys are crucial for partitioning, then it would be good if the Bridge does not rely on Round Robin only for distributing records among partitions.

My first thought to approach this, is to use the key in our Topic Mapping Rules.
E.g, given a ToMaR below

{
    "mqttTopic": "race/{date}/car/{carID}/#",
    "kafkaTopic":"race_{carID}_info",
    "kafkaKey": "car_{carID}"
}

With this, if the actual MQTT topic is race/12.12.2023/car/123/stats all the races stats for car with ID 123 would go to the same partition inside the Kafka topic race_123_info.

On the other hand, if no key was specified in the ToMar, maybe we shall use the MQTT topic as the key?

Let's assume that the IoT devices do not know anything about the Kafka Cluster, the only thing they must know is about MQTT.

I don't think we can consider this as true. The client HAS TO know it's interacting with Kafka, the fact it's using MQTT protocol is just to avoid to know the Kafka protocol.
If we follow your statement, even the HTTP bridge would be the same ... an HTTP client doesn't know anything about Kafka key. But it's not the case a bridge is a bridge ... so that are concept of the destination (Kafka) that has to be known by the source (client) whatever is the protocol that the bridge is bridging.

I had the feeling that we are creating an abstract protocol ⇾ the MQTT and Kafka will work as one protocol. But I got what you explained above. Bridge:

(source)-------------> Bridge ------------------->(destination)

The idea of the Bridge is clear, still, I do not think it is ideal to pass the keys inside the payload. In normal cases, the MQTT clients specify the topic in which they want to publish the message, and the body of the message they want to publish.

Having a bridge is about facilitating devices, services, users, which/who are not able to communicate and use the raw Kafka protocol, so the corresponding Kafka clients.
It can happen for many reasons (constrained devices where MQTT is better, developers who don't know Kafka protocol and how to use it and they are more keen to work with MQTT, or HTTP for example).
But again they want to use Kafka.
The bridge is not abstracting the usage of Kafka, it's bridging so again you know you are talking with Kafka but with a different protocol.
Of course, this protocol (MQTT or HTTP) doesn't know about some Kafka concepts and here where the bridge comes into the picture. If the protocol doesn't support something out of box to map to Kafka constructs you have to deal with it with what you call an "abstract protocol" which is actually just a specific payload format.
Blocking devices/users from using Kafka keys is bad, because they would want to write to Kafka, partitioning data through key but just using a different protocol.

In normal cases, the MQTT clients specify the topic in which they want to publish the message, and the body of the message they want to publish.

You don't have to think about "normal" MQTT clients, because these would just need a "normal" MQTT broker where everything you say is true. You have to think in terms of Kafka not MQTT which has to be just a different protocol to talk with Kafka brokers.

Blocking devices/users from using Kafka keys is bad, because they would want to write to Kafka, partitioning data through key but just using a different protocol.

This is where the Bridge come into the picture. For example, the user knows it is dealing with Kafka, so it could also specify the mapping rules from the MQTT client side for each message it wants to send to Kafka.

We are not totally blocking them to know about Kafka or do something specific with Kafka. But for keys, I believe it would be better if the users/clients could use the Bridge to handle the keys because we can not use the fixed header nor variable header to pass the keys, and of course, I would let the payload just for messages.

Or it is possible to do this with the headers or any other way?

I see this approach too much bridge related where the key is defined in the configuration.
While it looks good it isn't so dynamic as I would love to have. I mean changing the key format means changing the config and restarting the bridge.
On the other side it would allow to have just bytes as payload in the MQTT packet which is what in general an MQTT payload is.
Having the key in the MQTT message would drive to have a JSON like:

{
   "key": "my-key",
   "value": "<value>"
}

As it happens for the HTTP bridge today.
I am conflicted ... :-)

As per our early meeting, we have agreed to pass the keys in the topic mapping rules.
Can be fixed with #25