/data-pipeline-storm

Translating an Event Hub stream to chunky blobs using Apache Storm and Trident

Primary LanguageJavaMIT LicenseMIT

Data Pipeline Guidance (with Apache Storm)

Microsoft patterns & practices

This project focuses on using Apache Storm/Trident with Java. For guidance on using .NET without Storm, see the companion Data Pipeline Guidance.

Overview

The two primary concerns of this project are:

  • Facilitating cold storage of data for later analytics. That is, translating the chatty stream of events into chunky blobs.

  • Demonstrate how to use OpaqueTridentEventHubSpout and Apache Storm/Trident to store Microsoft Azure Eventhub messages to Microsoft Azure Blob exactly-once.

Next Steps

Backlog

  • Performance Resut: The performance result will be published once we finishes the performance test.

  • Using Zookeeper to store the state: The current sample stores state in Redis Cache. We plan to replace that with Zookeeper.