/Aletheia

Outbrain's data pipeline framework

Primary LanguageJavaApache License 2.0Apache-2.0

master branch: Build Status

Aletheia is a framework for implementing high volume, datum (event) bassed, producer-consumer data piplelines. Its key features are:

  • A unifrom API
  • Fine grained visibility
  • Multiple serializatoin formats
  • Schema evolution support

Aletehia supports the following producers/consumers out-of-the-box:

  • Kafka 0.8 (production and consumption)
  • Kafka 0.7 (production and consumption)
  • Log files (production only)

Custom producer/consumer types are easy to write. See the Wiki for details.

Datum Production Example

Build a Datum Producer once:

DatumProducer<Click> datumProducer = 
    DatumProducerBuilder
      .forDomainClass(Click.class)
      .registerProductionEndPointType(KafkaTopicProductionEndPoint.class,
                                      new KafkaDatumEnvelopeSenderFactory())
      .deliverDataTo(new KafkaTopicProductionEndPoint(...), 
                     new JsonDatumSerDe<Click>(Click.class))
      .build(new DatumProducerConfig(...));

Then, produce away:

datumProducer.deliver(new Click(...));

Datum Consumption Example

Build a DatumConsumerStream list once:

List<DatumConsumerStream<Click>> datumConsumerStreams =
    DatumConsumerStreamsBuilder
        .forDomainClass(Click.class)
        .registerConsumptionEndPointType(KafkaTopicConsumptionEndPoint.class,
                                         new KafkaDatumEnvelopeFetcherFactory())
        .consumeDataFrom(new KafkaTopicConsumptionEndPoint(...), 
                         new JsonDatumSerDe<Click>(Click.class))
        .build(new DatumConsumerStreamConfig(...));

Then, consume away:

// parallelism is 1, so we take the first DatumConsumerStream and forget about the list
DatumConsumerStream<Click> clickStream = Iterables.getFirst(datumConsumerStreams, null);

// blocking
for (final Click click : clickStream.datums()) {
 // handling logic goes here
}

Usage

First, have the aletheia-core jar included in your pom:

<dependency>
  <groupId>com.outbrain.aletheia</groupId>
  <artifactId>aletheia-core</artifactId>
  <version>x.y</version>
</dependency>

Then, include the aletheia extensions you'll be using, which can be one or more of the following:

<dependency>
  <groupId>com.outbrain.aletheia</groupId>
  <artifactId>aletheia-kafka0.8</artifactId>
  <version>x.y</version>
</dependency>
<dependency>
  <groupId>com.outbrain.aletheia</groupId>
  <artifactId>aletheia-log4j</artifactId>
  <version>x.y</version>
</dependency>

If you prefer building Aletheia yourself, please see the Hello Datum! wiki page.

Documentation

Developers

Aletheia has been developed by the data infrastructure team at Outbrain.
Please feel free to contact us for any details: