This repository contains code to test the Avro Serialization framework in Python3. The Avro library for Python 3 has slightly different commands to Python 2, so bear that in mind when trying to run the project.
-
avro_tutorial.py
contains very basic code to test Avro (the same one provided on the Avro documentation). It takes the schema inschemas/test.avsc
, serializes objects according to that schema, writes them to a file inserialized_data/avro_tutorial
, and then reads the file and deserializes the data again. -
avro_kafka_csi_producer.py
is a script that consumes a set of events from a configurable Kafka stream (properties obtained from config.ini), tries to serialize these events according to the schema inschemas/customer_status_changes.avsc
, and on success, sends the serialized event to a configurable Kafka Producer. -
avro_kafka_csi_consumer.py
is a script that consumes serialized events from the same Kafka queue to whichavro_kafka_csi_producer.py
writes, deserializes them according to the schema inschemas/customer_status_changes.avsc
, and prints the result. -
avro_spark.py
is a script that uses Spark to read a file containing data serialized with Avro. -
flume.properties
is an example configuration for a Flume agent consuming avro serialized events from Kafka and storing them to a file with a.avro
extension. This Flume agent has not been successfully tested yet.