Hello friends and welcome to this full day workshop on Azure Data Factory. Today we will be all becoming advanced factory workers!... And we completely recommend this description when describing your job to family members. But be warned, if you go on to tell them that the factory is in the cloud you are likely to be branded as crazy. However, here and now that is ok. You are amongst like-minded geeky friends that all want to become cloud factory workers as well :-)
On a more serious note; throughout our day of training you will quickly notice, like with most technologies, there are an awful lot of different ways you can implement this Azure orchestration service and understanding the best way to do something is often the biggest challenge. That said, if you only take away one thing from today I would ask that you have an appreciation of this fact. Then when delviering solutions you take a step back from the requirements and think about the overall technical design and how Azure Data Factory should fit into your platform as a core component.
All too often with new and shiny services we start playing around then try to make the technology fit our solution. Rather than thinking about the solution requirements and which technology meets our needs. This is true of all developers, I don't want to preach, so am simply asking for a little bit of mindfulness.
A Day Full of Azure Data Factory
To achieve any data processing in Azure you need an umbrella service to manage, monitor and schedule your solution. For a long time when working on premises, the SQL Agent has been our go-to tool, combined with T-SQL and SSIS packages. It’s now time to upgrade our skills and start using cloud native services to achieve the same thing on the Microsoft Cloud Platform. Within a PaaS-only Modern Data Platform, the primary component for delivering that orchestration is Azure Data Factory, combined with various other compute resources.
In this full day of training you’ll start with the basics and learn how to orchestrate your Azure Data Platform end to end. You will learn how to build Azure ETL/ELT pipelines using all Data Factory has to offer. Plus, consider hybrid architectures, dynamic design patterns, think about lifting and shifting legacy packages, and explore complex bootstrapping to orchestrate everything within your solution.
We’ll breakdown the content for this rich Azure PaaS resource as follows:
- Azure Data Factory fundamentals. What is it and why use it?
- Uploading data from on-premises to Azure.
- Using SSIS packages in Azure.
- Data Factory Mapping & Wrangling Data Flows.
- Dynamic metadata driven pipelines.
- Data Factory alerting, security and monitoring.
- Pipeline pricing.
- Data Factory CI/CD using Azure DevOps.
- Using Azure Data Factory in production.
If that's not enough content for one day, you will also get access to a set of hands-on labs that you can work through at your own pace. Whether you are new to Azure Data Factory or have some experience, you will leave this workshop with new skills and ideas for your projects.
-
Module 1: Data Factory Fundamentals
- What is it and why use it?
- Resource Components
- Common Activities
- Execution Dependencies
-
Module 2: Uploading Data to Azure
- Integration Runtimes
- Azure IR
- Hosted IR
- Hosted IR Patterns
- Demo - Linked IR's
- Demo - Simple Data Upload
- Private Endpoints
- Integration Runtimes
-
Module 3: Using SSIS Packages in Azure
- SSIS Integration Runtime
- Packages Running on PaaS
- Scaling Out Package Execution
- Demo - Scale Out Execution of Anything
-
Module 4: Data Flows
- Mapping Data Flows
- Demo - Building a Mapping Data Flow
- Wrangling Data Flows
- Demo - Using a Wrangling Data Flow
- Configuration
- Use Cases
- Mapping Data Flows
-
Module 5: Metadata Driven Pipelines
- Expressions
- Dynamic Pipelines
- Demo - Data Discovery and Upload
- Demo - Simple Metadata and Upload
- Demo - Lazy SQLDB Replication
- Orchestration Framework - procfwk.com
- Demo - Framework Failure Handling & Restart
-
Module 6: Monitoring Alerting Security
- Logging
- Alerting
- Demo - How To Build Alerting
- Using Azure Key Vault
- Access & Permissions
-
Module 7: Pricing & Limitations
- Cost
- Activities
- Data Integration Units
- Data Flow Compute
- Wider Platform Orchestration
- Resource Limitations
- Cost
-
Module 8: CI/CD with Azure DevOps
- Source Control vs Developer UI
- ARM Template Deployments
- Demo - Basic Deployment via Azure DevOps
- PowerShell Deployments
-
Module 9: Data Factory in Production
- Testing
- Demo - Running NUnit Tests
- Bootstrapping
- Best Practices
- Testing
-
Module 10: Wrap Up
- Conclusions
- Questions
- Homework
Principal Consultant - Solution Architect & Data Platform MVP @ Altius Consulting Ltd
Paul is a Microsoft Data Platform MVP with 15+ years’ experience working with the complete on premises SQL Server stack in a variety of roles and industries. Now as an industry leading consultant has turned his keyboard to big data solutions on the Microsoft cloud platform. Specialising in all things data engineering (Data Factory, Data Bricks, Data Lake and Stream Analytics). Paul is also a STEM Ambassador for the networking education in schools’ programme, PASS chapter leader, a member of the Data Relay committee, SQL Bits, SQL Saturday, SQL Day, SQLGLA, PASS Summit speaker and helper.
You can contact Paul via:
- Email paul@mrpaulandrew.com
- Twitter @MrPaulAndrew
- LinkedIn In/mrpaulandrew
- Blog mrpaulandrew.com
Senior Data Engineer @ Boomin
Richard is the author of the lab materials provided as part of the workshop. He is an experienced data engineer specialising in the Microsoft Azure and SQL Server data platforms. An active member of the Microsoft data platform community, Richard is a speaker, blogger, volunteer and Data Relay event organiser. His book Azure Data Factory by Example will be published by Apress in early 2021.
You can contact Richard via:
- Email richard@richardswinbank.net
- Twitter @richardswinbank
- LinkedIn In/richardswinbank
- Blog richardswinbank.net