fge/json-schema-avro

Parsing more complex JSON schemas to Avro

beligum opened this issue · 2 comments

Hi,

I'd like to help improve the json-to-avro converter.
I'm trying to convert the official European Broadcasters metadata ontology (EBUCore, an extension on Dublin Core) to an Avro schema. You can download the XSD schemas here: https://www.ebu.ch/metadata/schemas/EBUCore/ebucore.zip

I've successfully converted the ebucore.xsd file (with it's two dependencies) to a JSON schema by converting them first to a Java classes hierarchy using XJC (JAXB) and then to a JSON schema using com.fasterxml.jackson.module.jsonSchema and this code:

        AnnotationIntrospectorPair introspector = new AnnotationIntrospectorPair( new JaxbAnnotationIntrospector(TypeFactory.defaultInstance()), new JacksonAnnotationIntrospector());

        ObjectMapper objectMapper2 = new ObjectMapper();
        objectMapper2.setAnnotationIntrospector(introspector);

        SchemaFactoryWrapper visitor = new SchemaFactoryWrapper();
        objectMapper2.acceptJsonFormatVisitor(objectMapper2.constructType(EbuCoreMainType.class), visitor);
        JsonSchema jsonSchema2 = visitor.finalSchema();
        if (jsonSchemaFile.exists()) {
            jsonSchemaFile.delete();
        }
        ObjectWriter fileSchemaWriter = objectMapper2.writerWithDefaultPrettyPrinter();
        fileSchemaWriter.writeValue(jsonSchemaFile, jsonSchema2);

The resulting file is attached to this issue: ebucore.json

I know the Avro converter is complex and not production ready yet, but I think this might be a good example to try to mature it. This is what I do to convert the schema to Avro:

final SchemaTree tree = new CanonicalSchemaTree(SchemaKey.forJsonRef(JsonRef.emptyRef()), jsonSchema);
        final ValueHolder<SchemaTree> input = ValueHolder.hold("schema", tree);
        AvroWriterProcessor processor = new AvroWriterProcessor();

        ProcessingReport report = new ConsoleProcessingReport(LogLevel.DEBUG);
        Schema jsonSchemaToAvro = processor.process(report, input).getValue();

First thing I had to do to resolve a weird crash was upgrade json-schema-validator in json-schema-avro to version 2.2.6 using this issue as a guide. (I upgraded to Avro v1.7.7 at the same time)

Now the processor kicks off, but dies immediately with: com.github.fge.jsonschema.core.exceptions.ProcessingException: fatal: no suitable processor found

Digging around in AvroPredicates.record(), I found that missing "additionalProperties" leads to records not being recognised, which is a bug, I think. So I changed the default in the corresponding code (.asBoolean) to false.

Now the schema gets parsed a fair amount, but crashes on the many $ref occurences in the schema always resulting in in a "com.github.fge.jsonschema.core.exceptions.ProcessingException: fatal: no suitable processor found" because no matching predicate was found for eg.

"items" : {
  "type" : "object",
  "$ref" : "urn:jsonschema:com:beligum:blocks:schema:dublincore:jaxb:ElementType"
}

So I'm stuck and could use a helping hand. Does anyone want to have a look at this or guide me in the right direction to make this tool better?

cheers,

b.

I am very interested in development related to this functionality myself.

Do we have any update on the JSON schema to Avro schema conversion readiness for complex records, optionals and arrays.