Parsing more complex JSON schemas to Avro
beligum opened this issue · 2 comments
Hi,
I'd like to help improve the json-to-avro converter.
I'm trying to convert the official European Broadcasters metadata ontology (EBUCore, an extension on Dublin Core) to an Avro schema. You can download the XSD schemas here: https://www.ebu.ch/metadata/schemas/EBUCore/ebucore.zip
I've successfully converted the ebucore.xsd file (with it's two dependencies) to a JSON schema by converting them first to a Java classes hierarchy using XJC (JAXB) and then to a JSON schema using com.fasterxml.jackson.module.jsonSchema and this code:
AnnotationIntrospectorPair introspector = new AnnotationIntrospectorPair( new JaxbAnnotationIntrospector(TypeFactory.defaultInstance()), new JacksonAnnotationIntrospector());
ObjectMapper objectMapper2 = new ObjectMapper();
objectMapper2.setAnnotationIntrospector(introspector);
SchemaFactoryWrapper visitor = new SchemaFactoryWrapper();
objectMapper2.acceptJsonFormatVisitor(objectMapper2.constructType(EbuCoreMainType.class), visitor);
JsonSchema jsonSchema2 = visitor.finalSchema();
if (jsonSchemaFile.exists()) {
jsonSchemaFile.delete();
}
ObjectWriter fileSchemaWriter = objectMapper2.writerWithDefaultPrettyPrinter();
fileSchemaWriter.writeValue(jsonSchemaFile, jsonSchema2);
The resulting file is attached to this issue: ebucore.json
I know the Avro converter is complex and not production ready yet, but I think this might be a good example to try to mature it. This is what I do to convert the schema to Avro:
final SchemaTree tree = new CanonicalSchemaTree(SchemaKey.forJsonRef(JsonRef.emptyRef()), jsonSchema);
final ValueHolder<SchemaTree> input = ValueHolder.hold("schema", tree);
AvroWriterProcessor processor = new AvroWriterProcessor();
ProcessingReport report = new ConsoleProcessingReport(LogLevel.DEBUG);
Schema jsonSchemaToAvro = processor.process(report, input).getValue();
First thing I had to do to resolve a weird crash was upgrade json-schema-validator in json-schema-avro to version 2.2.6 using this issue as a guide. (I upgraded to Avro v1.7.7 at the same time)
Now the processor kicks off, but dies immediately with: com.github.fge.jsonschema.core.exceptions.ProcessingException: fatal: no suitable processor found
Digging around in AvroPredicates.record(), I found that missing "additionalProperties" leads to records not being recognised, which is a bug, I think. So I changed the default in the corresponding code (.asBoolean) to false.
Now the schema gets parsed a fair amount, but crashes on the many $ref occurences in the schema always resulting in in a "com.github.fge.jsonschema.core.exceptions.ProcessingException: fatal: no suitable processor found" because no matching predicate was found for eg.
"items" : {
"type" : "object",
"$ref" : "urn:jsonschema:com:beligum:blocks:schema:dublincore:jaxb:ElementType"
}
So I'm stuck and could use a helping hand. Does anyone want to have a look at this or guide me in the right direction to make this tool better?
cheers,
b.
I am very interested in development related to this functionality myself.
Do we have any update on the JSON schema to Avro schema conversion readiness for complex records, optionals and arrays.