/ADF.procfwk

A metadata driven processing framework for Azure Data Factory supported by Azure SQLDB and Azure Functions

Primary LanguageTSQLMIT LicenseMIT

ADF.procfwk

alt text

Code Project Overview

This open source code project delivers a simple metadata driven processing framework for Azure Data Factory (ADF). The framework is made possible by coupling ADF with an Azure SQL Database that houses execution stage and pipeline information that is later called using an Azure Functions App. The parent/child metadata structure firstly allows stages of dependencies to be executed in sequence. Then secondly, all pipelines within a stage to be executed in parallel offering scaled out control flows where no inter-dependencies exist.

The framework is designed to integrate with any existing Data Factory solution by making the lowest level executor a stand alone Worker pipeline that is wrapped in a higher level of controlled (sequential) dependencies. This level of abstraction means operationally nothing about the monitoring of orchestration processes is hidden in multiple levels of dynamic activity calls. Instead, everything from the processing pipeline doing the work (the Worker) can be inspected using out-of-the-box ADF features.

This framework can also be used in any Azure Tenant and allow the creation of complex control flows across multiple Data Factory resources by connecting Service Principal details through metadata to targeted Subscriptions > Resource Groups > Data Factory's and Pipelines, this offers very granular administration over data processing components in a given environment.

Framework Features

  • Granular metadata control.
  • Metadata integrity checking.
  • Global properties.
  • Dependency handling.
  • Execution restart-ability.
  • Parallel execution.
  • Full execution and error logs.
  • Operational dashboards.
  • Low cost orchestration.
  • Disconnection between framework and Worker pipelines.
  • Cross Data Factory control flows.
  • Pipeline parameter support.
  • Simple troubleshooting.
  • Easy deployment.
  • Email alerting.

ADFprocfwk.com

Contributors

Who Details
Paul Andrew @mrpaulandrew
paul@mrpaulandrew.com
https://mrpaulandrew.tech

Issues

If you've found a bug or have a new feature request please log the details using the repository issues.

Go to... Issues

Projects

Go to... External Requests

Go to... Internal Backlog

Glossary

Go to... Glossary

Resources and Content

alt text Blogs mrpaulandrew.com/ADF.procfwk
alt text GitHub github.com/mrpaulandrew/ADF.procfwk
alt text Twitter #ADFprocfwk
alt text Vlogs youtube.com/mrpaulandrew

Release Details

Version Overview Related Blog(s) & Release Notes
1.7.1 Alerting Check Bug Fix added, plus:
  • Pipeline parameter value size limit removed.
ADF.procfwk v1.7.1 - Alerting Bug Fix And Pipeline Parameter Size Limit Removed
1.7 Pipleline EMail Alerting added, plus:
  • Send email Function implemented and hardened.
  • Handy Notebook updates.
  • Activity failure paths improved.
  • MIT license and code of conduct added.
  • Error table bug fix. Error code attribute; INT to VARCHAR
ADF.procfwk v1.7 - Pipeline Email Alerting
1.6 Error Details for Failed Activities Captured, plus:
  • Pipeline parameters used at runtime captured in execution logs.
  • Emailing Function added, not yet implemented.
  • Unknown Worker outcomes optionally blocks downstream stages.
  • Solution housekeeping.
ADF.procfwk v1.6 - Error Details for Failed Activities Captured
1.5 Power BI Dashboard for Framework Executions, plus:
  • Worker Parallelism View.
  • Pipeline Run ID now logged.
  • Logging Attributes Bug Fix.
ADF.procfwk v1.5 - Power BI Dashboard for Framework Executions
1.4 Enhancements for Long Running Pipelines, plus:
  • Pipeline check status function added.
  • Function Data Factory client moved to internal class.
  • SQL GETDATE() changed to GETUTCDATE().
  • Glossary created, here.
  • Updated database views.
ADF.procfwk v1.4 - Enhancements for Long Running Pipelines
1.3 Metadata Integrity Checks, plus:
  • Logical pipeline predecessors.
  • Data Factory Powershell deployment script.
  • Helper Notebook.
  • Database objects renames and solution tidy up.
ADF.procfwk v1.3 - Metadata Integrity Checks
1.2 Execution Restartability, plus:
  • Data Factory annotations and descriptions.
  • Database covering indexes.
  • Pipeline log status changed from 'Started' to 'Preparing'.
  • Pipeline log start date/time now set in child pipeline.
ADF.procfwk v1.2 - Execution Restartability
1.1 Service Principal Handling via Metadata, plus:
  • Data Factory table.
  • Properties table and view.
  • Function body bug fix.
  • New sample data.
ADF.procfwk v1.1 - Service Principal Handling via Metadata
1.0 Simple framework designed and base compontents built.
  • Part 1 - Design, concepts, service coupling, caveats, problems.
  • Part 2 - Database build and metadata.
  • Part 3 - Data Factory build.
  • Part 4 - Execution, conclusions, enhancements.
Blog Series:
Creating a Simple Staged Metadata Driven Processing Framework for Azure Data Factory Pipelines