This project is centered on a procurement-themed business case for a luxury resort company, which has resort properties across the USA and now in France. Using dbt & Snowflake, a data pipeline is demonstrated to show how we can:
- Leverage Snowflake for secure and scalable platform to deliver the entire data pipeline
- Use Snowflake Cortex AI features to turn text and unstructured documents into meaningful insights that drive actions
- Bring LLMs to where the data reside
- dbt orchestrates the workflow while providing documentation, code colloboration and resusability, job scheduling, CI/CD, and a host of other development benefits
- dbt incremental models enable streamlined ways to build models relying on Cortex functionality
This project contains the following key flows:
- Sources - Procurement source data from different ERP systems is simulated for the USA vs. France plus supplier contracts
- Key Intermediate Model Flows
- Invoice/Order Merging - Various invoice and order details are merged to create a more holistic view of invoice transactions (US and France processes separated for business reasons)
- Currency Conversion - France line items are merged with FX rates to convert invoice amounts into USD
- Language Translations - France line item descriptions are ranslated into English using Cortex
- Spend Classification - Line descriptions are classified into spend categories using Cortex
- Consolidated US + France Transactions - Consolidates US and France transactions into a single view
- Contract Vector Database Loading Contracts are processed using UDF which calls FastAPI application deployed via Snowpark Container Services
- Contract Key Term Extraction Filtered Retrieval Augmented Generation used to provide Cortex context for contract key term Extraction
- Marts
- Invoice order line detail fact table Provides a wide view of orders for downstream analytics consumption
- Supplier Metrics Provides a view for every supplier in the system including key metrics and contract terms
Business Value:
- Downstream users get a single language to power insights using Snowflake Cortex
- Efficient orchestration and prevention of duplicate Cortex processing with incremental models
Business Value:
- Downstream users get classification into spend to power insights using Snowflake Cortex
- Efficient orchestration and prevention of duplicate Cortex processing with incremental models
Business Value:
- Key concepts from large unstructured documents turned into structured outputs for actionable supplier insights
- dbt efficiently orchestrates the data workflow
Business Value:
- Key concepts from large unstructured documents turned into structured outputs for actionable supplier insights
- dbt efficiently orchestrates the data workflow to facilitate retrieval augmented generation