Producer -> Kafka -> Consumer
- Used avro serialization (version 1.8.2) for messages;
- Kafka should store serialized entities;
- Producer should generate or read prepared files by offsets with entities and send them to event hub;
- Producer should be based on Apache streaming;
- Consumer also should be based on Apache streaming and reads messages from Kafka, deserializes them and write into a log;
- For consumer create a mechanism for offsets storing (into MS sql) for each batch. And use the offsets from DB when the job is starting;
- Use direct stream for both (Consumer and Producer);
- Entity struct can be design by yourself. It should contains several fields such as List, org.joda.time.DateTime, long etc;
- Write integration test for whole pipeline and check input pull of messages with out put messages pull;
- Use logger (slf4j or alternative);
- Extract all configs to external properties files;
- Use maven.