/demo_dp203_by_wilson-mok

In this repository, you will find varies demo and presentations I have delivered throughout the year. This includes the link to the video, the source codes and the data files.

Primary LanguageJupyter Notebook

Demo

In this repo, you will find the instructions and source codes for my presentations and demos.

Presentations

Azure Databricks | Azure Data Factory | Building streaming data pipelines with Medallion architecture

We have reached the fourth and final part of the Azure Databricks series. In this session, I will cover:

  • Design and implement streaming pipelines using Azure Databricks and Azure Data Factory.
  • A quick introduction to Delta Live Table in Azure Databricks.

After this session, you will be able to create your own streaming data pipeline for your Lakehouse using Azure Databricks and Azure Data Factory.

Source code folder: Azure Databricks - Building a streaming data pipeline

Azure Databricks | Azure Data Factory | Develop data pipelines for a Medallion delta lake using Azure Databricks and Azure Data Factory

In this session, I will cover:

  • What is a Medallion delta lake architecture
  • Design and demo AutoLoader and structure streaming (Micro-batching) for batch processing using Azure Data Factory and Azure Databricks.
  • Using Databricks Serverless SQL to serve data to PowerBI.

After this session, you will be able to create your own streaming solution using Azure Databricks and Azure Data Factory.

Source code folder: Azure Databricks - MSDEVMTL1 - Develop data pipelines for a Medallion Delta Lake

Azure Databricks | Building a Lakehouse | Medallion architecture

This session is the third installment of a four-part series. In this session, I will discuss:

  • Create a Lakehouse with Medallion architecture.
  • Create a data model in the gold layer to share with multiple projects.
  • Implement the Azure Data Factory pipeline to automate orchestrate the Databricks notebooks.
  • Use Databricks SQL to connect to Power BI for reporting.

After this session, you will be able to create your own Lakehouse using Azure Databricks and Azure Data Factory.

Source code folder: Azure Databricks - Building a Lakehouse

Azure Databricks | Azure Data Factory | Building your first data pipeline

This session is the second installment of a four-part series. In this session, I will discuss:

  • Introduce Delta Lake format.
  • Discuss different types of Databricks clusters and assoicated costs.
  • Introduce Azure Data Factory and how we can schedule your data pipeline.
  • A demo of using Azure Data Factory and Databricks to schedule your data pipeline.

At the end of this session, you will be able to create your own data pipeline and create a schedule for it to run automatically using Azure Databricks and Azure Data Factory.

Source code folder: Azure Databricks - Building your first pipeline

Azure Databricks | Getting started with Azure Databricks

This is a four-part series on Azure Databricks. In this session, I will discuss:

  • Introduce Databricks and its features.
  • Compare the Databricks Community editions and Azure Databricks Premium editions.
  • Provide a brief tour of the Databricks UI.
  • A demo of using Databricks to query data using PySpark.

At the end of this session, you will gain the basic understanding of what Databricks is and how it can be used for big data processing and analytics.

Source code folder: Azure Databricks - Getting started

Azure Synapse | Data Warehousing | Dimensional Data Model | Mapping Data Flow

In this session, I will discuss the best practices for data modeling and the process of creating a data pipeline using Mapping Data flow in Synapse Analytics.

This includes:

  • What is Data Warehousing?
  • How to design and create a Dimensional data model?
  • A demo of using Azure Synapse Analytics to create a data pipeline to store data into the data warehouse.

This is the second part of a two part series on Azure Synapse Analytics.

Source code folder: Azure Synapse - Data Warehousing

Azure Synapse | Data Exploration | Serverless SQL | Serverless Spark

This is a two part series on Azure Synapse Analytics. In this session, I will guide you through the best practice for Data Exploration.

. This includes:

  • Overview of Azure Synapse Analytics
  • How to conduct a data explroation?
  • A demo of using Azure Synapse to prepare, clean and analyze data to create insight using Serverless SQL and Serverless Spark.

Source code folder: Azure Synapse - Data Exploration

Azure Data Factory | Git | CI/CD

This is the second part of the Azure Data Factory series. In this session, I will guide you through the best practice for code management and code deployment for Azure Data Factory. This includes:

  • Setting up git repository in Azure Data Factory.
  • What is continuous integration and continuous delivery (CI/CD) process?
  • A demo of creating an Azure DevOps CI/CD pipeline for Azure Data Factory.

Source code folder: Azure Data Factory - CI/CD

Azure Data Factory | Data pipeline | Mapping Data Flow

In this session, I will guide you through the best practices and the process of creating a data pipeline using Mapping Data flow in Azure Data Factory. This includes:

  • Overview of Azure Data Factory
  • How to design a data pipeline?
  • A demo of creating an end-to-end data pipeline.

Source code folder: Azure Data Factory - Data Pipeline