This project is about building a dimensional data warehouse in BigQuery by transforming an OLTP system to an OLAP system, using dbt as our data transformation tool.
The OLTP system contains the sales data of Northwind; a specialty food export-import company.
To bring Northwind's data reporting up to speed by embracing Dimensional Modeling.
What's the current setup or architecture?
- Northwind traders are companies that buy and sell special foods worldwide.
- This is a practice database made by Microsoft to showcase its product features and for learning purposes.
- The current setup combines on-site and older systems.
- They use MySQL for their main daily sales transactions.
- MySQL is also used for creating and running reports, but it's not efficient because analytical queries slow down the transaction system.
Why do we need a new setup or architecture?
- To scale up more easily.
- To make reports faster.
- To ease the burden on day-to-day operations.
- To enhance data security with better access control.
How do we set up a new system or architecture?
- Northwind traders can shift their current database to GCP.
- The on-site MySQL can be swapped with a fully managed cloud SQL.
- To handle reports, we'll create an OLAP data warehouse on GCP using BigQuery.
- We'll construct a Dimensional Data Warehouse on BigQuery following Kimball's method, incorporating dim and fact tables.
Throughout the interview process with the business and stakeholders, the following business processses were identified:
- Sales Overview: Overall sales reports to understand better, what is being sold to our customers, what sells the most, where and what sells the least, the goal is to have a general overview of how the business is going.
- Product Inventory: Understand the current inventory levels and how to improve stock management, what suppliers we have, and how much is being purchased. This will allow Northwind to understand stock management and potentially land better deals with suppliers
- Customer Reporting: Allow customers to understand their purchase orders, how much and when they are buying, empowering them to make data-driven decisions while Northwind utilizes this data in combination with its sales data.
This means the business is looking forward to getting insights on sales overview, product inventory, and customer reporting.
From the above ERD diagram of the OLTP transactional system, we identify the following required tables that will enable us to meet the business requirements:After a series of interviews with the business stakeholders and understanding their business requirements, we proposed the following data modeling concepts:
- Conceptual Data Model
- Logical Data Model
- Physical Data Model
Below is the proposed data warehouse architecture, which focuses on how the data flows from the old MySQL OLTP database to the new modernized dimensional data warehouse in BigQuery during the migration.
With the insights from the data modeling and the data warehouse architecture design, we go ahead to create the three layers (datasets) in Bigquery using dbt. These (staging, warehouse, and analytics_obt or reporting) layers are identified by the "dbt prefix".
The above three data layers help to achieve the business requirements of Northwind and Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights.
- The new Data Warehouse uses Bigquery for analytics and Business Intelligence which is more efficient than the previous MySQL system.
- The Reporting is derived from One Big Table denormalised from Dimensional models
- Sales Overview, Product Inventory, and Customer Reporting processes can now be carried out effectively to draw out insights
- Commands to install dbt and connect to bigquery here
- Commands to create tables and insert data here
- Commands to create Dim and Fact tables in different layers can be found here
- If you are not able to enable billing for Bigquery on your account, insert data manually by uploading csv files located here
- Learn more about dbt in the docs
- Check out Discourse for commonly asked questions and answers
- Join the chat on Slack for live discussions and support
- Find dbt events near you
- Check out the blog for the latest news on dbt's development and best practices
Find the Medium article I wrote about this project here .
And finally, click the analytics_engineering folder to see all the folders and files created during this project. 🌟