Aletheia is a framework for implementing high volume, datum (event) bassed, producer-consumer data piplelines. Its key features are:
- A unifrom API
- Fine grained visibility
- Multiple serializatoin formats
- Schema evolution support
Aletehia supports the following producers/consumers out-of-the-box:
- Kafka 0.8 (production and consumption)
Kafka 0.7 (production and consumption)- Log files (production only)
Custom producer/consumer types are easy to write. See the Wiki for details.
Build a Datum Producer once:
DatumProducer<Click> datumProducer =
DatumProducerBuilder
.forDomainClass(Click.class)
.registerProductionEndPointType(KafkaTopicProductionEndPoint.class,
new KafkaDatumEnvelopeSenderFactory())
.deliverDataTo(new KafkaTopicProductionEndPoint(...),
new JsonDatumSerDe<Click>(Click.class))
.build(new DatumProducerConfig(...));
Then, produce away:
datumProducer.deliver(new Click(...));
Build a DatumConsumerStream list once:
List<DatumConsumerStream<Click>> datumConsumerStreams =
DatumConsumerStreamsBuilder
.forDomainClass(Click.class)
.registerConsumptionEndPointType(KafkaTopicConsumptionEndPoint.class,
new KafkaDatumEnvelopeFetcherFactory())
.consumeDataFrom(new KafkaTopicConsumptionEndPoint(...),
new JsonDatumSerDe<Click>(Click.class))
.build(new DatumConsumerStreamConfig(...));
Then, consume away:
// parallelism is 1, so we take the first DatumConsumerStream and forget about the list
DatumConsumerStream<Click> clickStream = Iterables.getFirst(datumConsumerStreams, null);
// blocking
for (final Click click : clickStream.datums()) {
// handling logic goes here
}
First, have the aletheia-core jar included in your pom:
<dependency>
<groupId>com.outbrain.aletheia</groupId>
<artifactId>aletheia-core</artifactId>
<version>x.y</version>
</dependency>
Then, include the aletheia extensions you'll be using, which can be one or more of the following:
<dependency>
<groupId>com.outbrain.aletheia</groupId>
<artifactId>aletheia-kafka0.8</artifactId>
<version>x.y</version>
</dependency>
<dependency>
<groupId>com.outbrain.aletheia</groupId>
<artifactId>aletheia-log4j</artifactId>
<version>x.y</version>
</dependency>
If you prefer building Aletheia yourself, please see the Hello Datum! wiki page.
Aletheia has been developed by the data infrastructure team at Outbrain.
Please feel free to contact us for any details:
- Stas Levin - slevin@outbrain.com
- Harel Ben-Attia - harel@outbrain.com
- Izik Shmulewitz - ishmulewitz@outbrain.com