Schema representation - 'importing' types
Closed this issue · 3 comments
The specification states the following:
A schema or protocol may not contain multiple definitions of a fullname. Further, a name must be defined before it is used (“before” in the depth-first, left-to-right traversal of the JSON parse tree, where the types attribute of a protocol is always deemed > to come “before” the messages attribute.)
This means that the following schema is valid (the A type is defined in b_field_one and the type is not defined with b_field_two):
{
"name":"B",
"type":"record",
"fields":[
{
"name":"b_field_one",
"type":{"name":"A","type":"record","fields":[]}
},
{
"name":"b_field_two",
"type":{"name":"A"}
},
]
}
This crate makes it easy to load a schemata in the form of a sequence (Schema::parse_list) of schema where:
- The ordering rule is relaxed (no need to load the schemata in a particular order)
- The defined-once rule is applied to the whole sequence, instead of per schema
However I can't find a way to output an individual schema in a schemata that is 'complete' (ie does not depend on other schemas, and otherwise complies with the rules). For example:
let schema_str_1 = r#"{
"name": "A",
"doc": "A's schema",
"type": "record",
"fields": [
]
}"#;
let schema_str_2 = r#"{
"name": "B",
"doc": "B's schema",
"type": "record",
"fields": [
{"name": "b_field_one", "type": "A"},
{"name": "b_field_two", "type": "A"}
]
}"#;
let schema_strs = [schema_str_1, schema_str_2];
let schemata = Schema::parse_list(&schema_strs)?;
for d in schemata {
println!("{}",d.canonical_form());
}
Gives the following canonical schemas (for A I think this is OK, B is problematic as there is no definition for A):
{"name":"A","type":"record","fields":[]}
{"name":"B","type":"record","fields":[{"name":"b_field_one","type":"A"},{"name":"b_field_two","type":"A"}]}
I believe the canonical form for B should actually be:
{"name":"B","type":"record","fields":[{"name":"b_field_one","type":{"name":"A","type":"record","fields":[]}},{"name":"b_field_two","type":"A"}]}
Do we have a way of producing this 'correct' form? This is necessary for fingerprint calculation and some schema registry interactions.
Sorry about the noise - I've added a fix for nested Refs (where a schema depends on another that depends on another), and re-opened the PR. Added a test for this as well.
The goal here is ultimately to support interop with schema registries where each schema is stored independently of other schemata in the registry.
I haven't changed the current functionality (Schema::canonical_form()), EXCEPT, you'll notice a change in the two test cases:
- test_avro_3370_record_schema_with_currently_parsing_schema_named_enum()
- test_avro_3370_record_schema_with_currently_parsing_schema_named_fixed()
In both cases, the 'expected' canonical form was (I believe) incorrect, as they both include two (duplicate) definitions for a type.