/edw-best-practices

Git Repo for EDW Best Practice Assets on the Lakehouse

Primary LanguagePythonMIT LicenseMIT

edw-best-practices

Git Repo for EDW Best Practice Assets on the Lakehouse

This Git Project Provides a framework of example notebooks that aims to show any typical data warehousing SQL users how to built pipelines and analytics on the Lakeshouse. Broken out in 4 steps, the notebooks will walk the user through a single use case that they can run in their own Databricks environment leading them through the data maturity curve as follows:

  • 1. Step 1 - Build a classical batched-oriented SQL pipeline with best practices on the Lakehouse
  • 2. Step 2 - Build the above in Delta Live Tables and automate all orchestration
  • 3. Step 3 - Build and analyze summary analytics tables
  • 4. Step 4 - Create gold views
  • 4. Step 5 - Convert and old batch pipeline to a Streaming pipeline

    This Git repo also provides some examples of more advacned use cases like using the Delta Change Data feed.

    This Git repo also provides some helper functions that make ETL easier in production pipelines.