This is the code repository for Apache Airflow Best Practices, published by Packt.
A practical guide to orchestrating data workflow with Apache Airflow
With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering
This book covers the following exciting features:
- Explore the new features and improvements in Apache Airflow 2.0
- Design and build data pipelines using DAGs
- Implement ETL pipelines, ML workflows, and other advanced use cases
- Develop and deploy custom plugins and UI extensions
- Deploy and manage Apache Airflow in cloud environments such as AWS, GCP, and Azure
- Describe a path for the scaling of your environment over time
- Apply best practices for monitoring and maintaining Airflow
If you feel this book is for you, get your copy today!
All of the code is organized into folders. For example, Chapter-04.
The code will look like the following:
class MetricsPlugin(AirflowPlugin):
"""Defining the plugin class"""
name = "Metrics Dashboard Plugin"
flask_blueprints = [metrics_blueprint]
appbuilder_views = [{
"name": "Dashboard", "category": "Metrics",
"view": MetricsDashboardView()
}]
Following is what you need for this book: This book is for data engineers, developers, IT professionals, and data scientists who want to optimize workflow orchestration with Apache Airflow. It's perfect for those who recognize Airflow’s potential and want to avoid common implementation pitfalls. Whether you’re new to data, an experienced professional, or a manager seeking insights, this guide will support you. A functional understanding of Python, some business experience, and basic DevOps skills are helpful. While prior experience with Airflow is not required, it is beneficial.
The code sources and examples in this book were primarily developed with the assumption that you would have access to Docker and Docker Compose. We also make some assumptions that you have a passing familiarity with Python, Kubernetes, and Docker.
With the following software and hardware list you can run all code files present in the book (Chapter 1-13).
Chapter | Software required | OS required |
---|---|---|
2-12 | Airflow 2.0+ | Windows, macOS, or Linux |
2-12 | Python 3.9+ | Windows, macOS, or Linux |
2-12 | Docker | Windows, macOS, or Linux |
2-12 | Postgres | Windows, macOS, or Linux |
Dylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.
Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.
Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.