Phillipmartin/gopassivedns

FR: implement Avro output

candlerb opened this issue · 2 comments

Avro is a much more compact output format than JSON, whilst being straightforward to convert to/from JSON. It is natively supported in Kafka and has good support for schema evolution.

@candlerb I was literally looking at doing something along these lines in the last few days. I just want to confirm that what you are asking for is what I am looking at doing.

You want to have avro as a configurable output encoding type? instead of JSON/Messagepack etc?

Exactly.

When you dig down a bit more, this can mean a couple of different things:

  1. When sending to Kafka, write Avro single object format, which includes a fingerprint of the schema with each message. I believe this is the approach taken by the Confluent platform and its schema registry.

  2. When writing to disk, write Avro container files which include the schema in the header followed by batches of records, with each batch separated by a random 16-byte delimiter. These are convenient for map-reduce, and for seeking within a large file (e.g. binary chop).

Both would be nice to have, but I think the first is more important.