/azure-event-driven-data-pipeline

Building event-driven data ingestion pipelines in Azure

Primary LanguageC#MIT LicenseMIT

azure-event-driven-data-pipeline

Build Status

Problem

A large retailer with many source systems, wants a single source of truth of their data and be able to send updates to their consumers whenever this data is changed. They want to support an unpredictable load, with a max spike of 1500 req/sec.

This blog post describes the contents of this repo in detail.

Architecture

Deployment

Deploy to Azure

The entire deployment can be orchestrated using ARM template azuredeploy.json.

To deploy using Azure CLI;

az group deployment create -g <RESOURCE_GROUP> --template-file azuredeploy.json

Once the deployment is complete, the only manual step is to copy ConsumerReceiveFunc URL from the Azure portal and paste it multiple times (pipe | delimited) in ConsumerEgressFunc -> App Settings -> CONSUMERS.

Running load tests

We perform the load tests using Azure Container Instances. After creating resources using the above ARM template, run the following load testing script;

./generate-load.sh <RESOURCE_GROUP> <CONTAINER_NAME> https://http-ingress-func.azurewebsites.net/api/HttpIngressFunc?code=<FUNCTION_KEY>

Here is how to stream logs from the container;

az container attach -g <RESOURCE_GROUP> -n <CONTAINER_NAME>

Measuring Cosmos DB RUs using Application Insights

When we upsert into Cosmos DB, we log the Request Units consumed in Application Insights. The following Application Insights analytics query renders a timechart of RUs consumed, aggregated on 10 seconds.

customMetrics
| where timestamp > datetime("2018-03-05T12:26:00")
    and name == "product_RU"
| summarize avg(value) by name, bin(timestamp, 10s)
| render timechart

Resources

Choose between Azure services that deliver messages

Choose between Flow, Logic Apps, Functions, and WebJobs

Durable Functions overview

Understanding Serverless Cold Start

Azure Function Apps: Performance Considerations

Processing 100,000 Events Per Second on Azure Functions

Choose the right data store

Modeling document data for NoSQL databases

A fast, serverless, big data pipeline powered by a single Azure Function

Load testing with Azure Container Instances and wrk