You can obtain Jasvorno from Maven Central :
A library for serializing/deserializing arbitrary JSON to the Avro format using a Schema
. Although Avro already has some inbuilt JSON support, it has a number of limitations that cannot be addressed in a backwards compatible way. Jasvorno offers supplementary classes that allow any JSON document structure to be used with Avro with robust checks for conformity to a schema. See the 'Avro limitations' section for more information.
JsonNode datum = new ObjectMapper().readTree(jsonString);
Object avro = JasvornoConverter.convertToAvro(GenericData.get(), datum, schema);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Encoder encoder = new JasvornoEncoder(schema, outputStream);
new GenericDatumWriter<Record>(schema).write(avro, encoder);
encoder.flush();
String json = new String(outputStream.toByteArray())
Given the following schema, compare compliant JSON document structures used by the standard Avro JSON encoding and Jasvorno's encoding. Note that the Jasvorno documents are not polluted with union type indexes.
{"type": "record", "name": "myrecord",
"fields" : [
{"name": "id", "type": "long"},
{"name": "val", "type": ["string", "long", "null"]}
]
}
{"id": 1, "val": {"string": "hello"}}
{"id": 1, "val": {"long": 2}}
{"id": 1, "val": null}
{"id": 1, "val": "hello"}
{"id": 1, "val": 2}
{"id": 1, "val": null}
{"id": 1}
Although Avro already has inbuilt JSON coders: JsonDecoder
, JsonEncoder
. These require Avro specific document structures, primarily for handling union
types. Avro also offers support for free form JSON document structures with the org.apache.avro.data.Json
type, but this only checks for general JSON compliance (i.e. that it is a valid JSON document) rather than the structure of the node tree that the document declares. These limitations prevent the direct use of arbitrary JSON documents structures in a strict, schema enforced manner.
This is a problem because it either requires users to transform their JSON into Avro compatible forms, or for Avro specific implementation details to leak out into user's JSON models. Jasvorno solves this.
Jasvorno does not currently follow Avro's encoder/decoder symmetry. Instead we use the JasvornoConverter
in place of a Decoder
implementation. This is because in the absence of the union
indexes present in Avro's own JSON document structures, we must preemptively explore the document tree to determine the best Schema
match. Therefore it is easier to read the entire JSON document up front and then check for Schema
compliance against this, converting to Avro along the way. A potential problem with is that this approach might be expensive for schemas containing many, deep union types. Additionally there are type constructs that are impossible to disambiguate by referencing JSON node values alone; concretely any union containing bytes
and string
. In this event we favour the string
type but we also provide the com.hotels.jasvorno.schema.SchemaValidator
should you wish to defensively detect these ambiguous constructs in your schemas.
Although Jasvorno does not directly deal with schema evolution and compatibility, these concepts are common in systems that use Avro. This is of particular concern when encountering fields that are present in a JSON document, but not declared in the schema. Under some schema compatibility modes such fields are erroneous, yet with others they are expected. To model these different situations appropriately Jasvorno allows you to specify a UndeclaredFieldBehaviour
when constructing a JasvornoConverter
.
Jasvorno is based on the JsonUtil
class from the Kite project.
Created by Elliot West, with thanks to Adrian Woodhead, Dave Maughan, and James Grant.
This project is available under the Apache 2.0 License.
Copyright 2016-2017 Expedia Inc.