This DBT project ingests data from the Northwind database into an Amazon Redshift data warehouse and transforms it into analytics tables. Data Sources
The Northwind database contains sample data for a fictional company. It has the following tables:
Categories - Product categories
Customers - Customers
Employees - Employees
OrderDetails - Details of each order
Orders - High level order data
Products - Products sold by the company
Shippers - Shipping companies
Suppliers - Suppliers of products
The data is extracted from the Northwind database into CSV files and stored in an S3 bucket for loading into Redshift.
- A Redshift cluster was created on AWS and made publicly accessible so DBT can connect to it.
- The Northwind database schema is created in Redshift using DDL scripts.
- The CSV data files are loaded from S3 into the Redshift tables using COPY commands.
The following DBT models are created:
stg_customers - De-duplicates customer records, key is company_name + contact_name
stg_employees - Adds computed columns for age, length of service, full name
stg_orders - Adds computed columns for order total and discount amount
mart_denorm - Denormalized table joining products, suppliers, categories, orders, customers etc.
mart_partitioned - Partitioned copy of orders table by year (2020, 2021, 2022)
Clone this repo
Configure Redshift credentials in profiles.yml
Run dbt run to build all models
Run dbt test to run tests
Refer to the DBT documentation for more!
If you have any questions, feel free to contact me at: