Unify the way of how a Subject it's associated with the topic
eliax1996 opened this issue · 0 comments
Issue Description
Current Status
Let's clarify the required definitions to understand the issue properly:
- Topic: This is a mechanism used by Kafka to group different messages.
- Schema: It's a description of the structure that a message must adhere to.
- Subject: An entity required by the Schema Registry to establish a relationship between a schema and a topic. It's used to check the right to publish a specific messages in a topic.
The expected flow, given the entities introduced above, is as follows:
- We expect the user to create a schema.
- The user should register the schema with a specific topic using a designated Subject.
- The user should register a message with a specific topic using a certain Subject.
- This implies verifying that the Subject is allowed/registered to publish to a particular topic.
- It also involves verifying that the message structure complies with the schema associated with the entity.
To summarize the types of relationships we could have:
- One topic could be associated with one or more subjects.
- One subject could be associated with one or more topics.
- One subject is always associated with one schema.
To calculate the subject deterministically, we need:
- The policy name (an enum representing how we calculate the subject from the provided inputs, listed here).
- The topic name.
- The record name (the namespace of the record).
- A prefix for the subject (used to differentiate between key and value schemas, particularly in the
PROTOBUF
case).
Current Issue
Currently, we have only one policy to associate a Subject
with a topic, known as topic_name
. This strategy is used to calculate the subject for a given schema.
With this specific policy, the relationship between topic <-> schema <-> subject
is bi-directional (i.e., these entities have a one-to-one relationship). Therefore, given the topic and the schema, we can automatically compute the associated subject.
We use this property to enable users to produce messages by providing only the schema ID as a parameter, instead of the entire schema. This allows us to retrieve the schema based on its ID, calculate the unique associated subject, and check if the subject is registered for the targeted schema.
This approach works well for all cases except for Protobuf. Currently, we query the database for the schema and check if it's associated with the targeted topic, without verifying whether it's registered as a key or a value schema (meaning a user can switch between key and value by simply using the schema body). This issue needs to be addressed.
A more long-term design problem is that this property holds true only when the strategy is topic_name
. In the future, we need to ensure that before proceeding with message production by providing only schema IDs, we should check if the subject can be directly computed or if we also need the schema (for the record_name
strategy or the topic_record_name
strategy).
A more general solution would be to formalize the Subject object and assign it an ID.
This way, even if the relationship between subject and schema is not unique, we can directly verify if the subject is allowed and if the message structure complies with the schema by retrieving the subject from the database and then obtaining the associated schema using the subject ID.