asyncapi/bindings

Specify Avro encoding approach

dalelane opened this issue · 18 comments

There is a need to describe the Avro encoding when using Apache Avro as the payload schema format.

For context, when using an Avro schema to serialize data, a Kafka producer can choose between two ways of encoding the data:

  • json - the data is still human-readable which is sometimes useful
  • binary - the encoded data is smaller and can be faster to deserialize

A Kafka consumer needs to know which encoding mechanism was used to be able to deserialize the messages.

One of the most commonly used Kafka serializer libraries, Confluent's io.confluent.kafka.serializers.KafkaAvroSerializer, only supports binary encoding, so I think it's reasonable to assume binary as a default where not specified, and not try to treat this as a required field.

But it should be possible to specify when json encoding is being used with other serdes libraries such as Apicurio's

(related Slack question)

Fran's suggestion in Slack was that capturing this in the operation binding would allow this to be specified at the topic level.

Welcome to AsyncAPI. Thanks a lot for reporting your first issue.

Keep in mind there are also other channels you can use to interact with AsyncAPI community. For more details check out this issue.

I had a go at illustrating what this would look like. It got a bit long, so instead of putting it in a super-long comment here, I've put it in a stand-alone gist at https://gist.github.com/dalelane/3931c17b14c51fa4a1cf25496237d188

(I also addressed the related issue #41 as part of the same exploration)

This issue has been automatically marked as stale because it has not had recent activity 😴
It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation.
Thank you for your contributions ❤️

This issue has been automatically marked as stale because it has not had recent activity 😴
It will be closed in 60 days if no further activity occurs. To unstale this issue, add a comment with detailed explanation.
Thank you for your contributions ❤️

@dalelane is it Kafka specific only? why not on schemaFromat level?

Yeah, that's a fair point. I've only heard of this being an issue in Kafka, but in theory there's no technical reason why it only could apply to Kafka

at the end you just need to specify in the document what content type is sent over the wire right? like avro/binary and the schema format would still be json or yml, or?

@derberg When you're using a schema registry (which is the most common approach when Kafka developers are using Avro schemas), developers will specify the ID and version for the schema they're using. There are lots of different places that this can be specified.

It could be in a header. Or it could be in the message body - typically this is done by prefixing the rest of the message body with it, as some number of bytes are allocated before the rest of the message contents for ID and version. But there are different conventions in the Kafka ecosystem for how many bytes to use to do this.

The challenge here is that you don't know how to parse/deserialise the message body without knowing the convention that was used.

e.g. If the message publisher put the schema ID and version in the message body, and used two bytes to do this,
then the message subscriber needs to know to skip the first two bytes of the message body and then use Avro to deserialize only the data after that. (Even if you already have the schema, provided through AsyncAPI, to be able to do the deserialize at all you will still at least need to know to swallow the first couple of bytes, as they're not part of the Avro-serialized data)

It does sound tricky, would love to see this on the code level, might be useful to figure out how to present it then in the AsyncAPI file to generate a given code.

Did you explore the Java template we have (not Spring one). It needs some love and we are looking for maintainers there 😆

@derberg Where is the Java template, please? I was only aware of the two Spring generators (Java Spring and Java Spring Cloud Stream)

oh, sorry for confusion, by not Spring one I meant not Java Spring Cloud Stream

@derberg Brill, thanks. I had a go at adding support for the new security scheme types as a way of familiarising myself with the generator

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

@dalelane are you still working on this one?

This will be addressed by #115

This issue has been automatically marked as stale because it has not had recent activity 😴

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience ❤️

closing as I think this has been covered by the changes in #115