Data mesh is an architectural approach that advocates for breaking down intricate data architectures into smaller, more manageable elements. These elements are designed to cater to specific business domains and expose datasets with designated data contracts to data consumers. This approach simplifies the maintenance and governance of data in complex dbt implementations.
Data mesh architecture is recommended in scenarios where:
- Traditional single-project dbt setups become complex and difficult to maintain.
- There's a need to accommodate diverse data requirements across different organizational domains.
- Simplified data governance and democratization are essential for enhancing data accessibility and usability.
The repository comprises multiple dbt projects representing different business domains:
- Inventory: Models related to product catalog and stock management.
- Sales: Models related to customer data and order management.
- Finance: Models related to financial data analysis.
Each project exposes specific datasets and functionalities tailored to its respective domain.
The references
directory contains standalone SQL scripts to run in your database client. These scripts include commands for creating databases, loading sample data, and other necessary setup tasks.
The example scenario provided in the article revolves around Teddy Retailers, an organization managing data related to inventory, sales, and finance domains. Operational sources and data requirements are outlined to illustrate the application of data mesh architecture in addressing specific business needs.
- Product Catalog: Contains product ID, name, category, and current market price.
- Stock Entries: Includes entry ID, product ID, quantity, and purchase price.
- Orders: Contains order ID, customer ID, status, and date.
- Order Products: Includes transaction ID, order ID, product ID, and quantity.
- Product Catalog: Models product information including ID, name, category, and price.
- Stock Per Calendar Month: Calculates the quantity of each product acquired each month and the average price of acquisition for each product in that period.
- Customer Directory: Models customer information and calculates each customer's total lifetime value (TLV).
- Order Summary: Calculates the total value of each order.
- Order Profit and Loss: Calculates the total cost and value of orders.
For setting up the development environment, follow these steps:
- Navigate to the
projects
directory in your local data mesh repository. - Create and activate a Python virtual environment (at the time of writing the latest Python version supported is 3.11).
- Install dbt-teradata:
pip install dbt-teradata
. - Install the dbt-loom plugin:
pip install dbt-loom
. - Copy the contents of
projects/profiles-sample.yml
to your home’s directory.dbt/profiles.yml
file, if this file doesn’t exist in your system you need to create it. - Edit the
.dbt/profiles.yml
file with the appropriate host, username, and password according to your Teradata Vantage configuration for the three projects.
To run the data mesh, follow these steps:
- Navigate to the directory of each data mesh project.
- With your virtual environment activated, execute the following commands:
dbt debug
to ensure proper connection to your Teradata Vantage instance.dbt run
to run the project.
- When executing a referencing project, observe dbt-loom injecting the referenced models as part of the DAG execution of the referencing project.
dbt-loom, as a free and open-source project, does specify certain caveats related to the use of dbt-plugins. It is worth mentioning that documentation and lineage, generated by dbt docs generate
don’t reflect in detail the provenance of referenced models, as related to their home project.