FR: implement Avro output
candlerb opened this issue · 2 comments
Avro is a much more compact output format than JSON, whilst being straightforward to convert to/from JSON. It is natively supported in Kafka and has good support for schema evolution.
@candlerb I was literally looking at doing something along these lines in the last few days. I just want to confirm that what you are asking for is what I am looking at doing.
You want to have avro as a configurable output encoding type? instead of JSON/Messagepack etc?
Exactly.
When you dig down a bit more, this can mean a couple of different things:
-
When sending to Kafka, write Avro single object format, which includes a fingerprint of the schema with each message. I believe this is the approach taken by the Confluent platform and its schema registry.
-
When writing to disk, write Avro container files which include the schema in the header followed by batches of records, with each batch separated by a random 16-byte delimiter. These are convenient for map-reduce, and for seeking within a large file (e.g. binary chop).
Both would be nice to have, but I think the first is more important.