RegioHelden/django-kafka

Utilise automatic schema registration and retrieval

Closed this issue · 0 comments

There are two problems related to how we handle schema fetching:

  1. Serialization (Producing):

AvroTopic.key_schema and AvroTopic.value_schema currently fetch the latest schema for the topic.

This is a little unfavourable, as it requires clients to register the schema before producing any messages. But AvroSerializer already does this for us, so we can make it do the heavy lifting.

Therefore for serialization key_schema and value_schema should return an avro.Schema object (as defined by subclasses of AvroTopic depending on the message contents) which is then auto-registered by AvroSerializer

  1. Deserialization (Consuming)

The deserializers currently use the schema returned by key_schema and value_schema which are fixed to be the latest schema version registered under the topic name {topic}-key/value.

Instead we can remove the fixed schema_str argument and let the AvroDeserializer automatically determine and retrieve the schema based on the schema ID which is contained in the 1-4 bytes of the kafka message. See:

https://docs.confluent.io/platform/current/schema-registry/fundamentals/serdes-develop/index.html#wire-format