The RA Data Warehouse is a framework for ingesting, combining and restructuring data from multiple source systems into a conformed, dimensional data warehouse. The framework is based around dbt ("Data Build Tool"), open-source toolkit for templating and orchestrating SQL-based data transformations of RAW API-sourced data into structures suitable for analysis, and pre-built transformations and design patterns taken from Rittman Analytics' previous data warehousing consulting experience.
- Rittman Analytics homepage
- Centralize and Connect your Data consulting packages
- Rittman Analytics Blog dbt Posts
- Contact Us
dbt is a toolkit that solves the problem of testing, repeatability and modularity of analysts code by bringing the principles of modern software development to the analysts' workflow. The RA Data Warehouse framework solves the problem of how to design your dbt transformations so that your project doesn't grind to a halt after you integrate your second, third, fourth data source because you need to combine identity across multiple systems, deduplicate multiple sources of customer data and make sure that numbers coming out of your BI tool still match with the numbers in your source systems.
- Getting Started with dbt consulting packages
- dbt Viewpoint
- dbtCloud for scheduling and orchestrating dbt and the RA Data Warehouse
- For Rittman Analytics' team, to provide some standards around how we model and transform various data sources
- To make it simpler to run data quality tests than to not, by defining these tests in-advance
- To enable merging of customer, product, contact and other shared entity data with no single authoratitive source
- To pre-create derived analytics measures for individual and combinations of sources
- To create a means of selecting sources or subject areas ("modules") and have just those sources/modules loaded (and deployed for a customer)
- To enable use of either Stitch, Fivetran or Segment as the pipeline technology based on client need
- To enable loading and integration of custom (customer app database) sources into the warehouse
- Dimension Merge and Deduplication Across Multiple Data Sources
- Selectable Data Sources and ETL Technologies
- Separation of Source, Integration and Warehouse Layers
- Replacing of all business keys with GUID primary keys
- Automatic post-load logging of ETL step row counts and outcomes (complete, error etc)
- Google BigQuery (Standard SQL)
- Stitch
- Fivetran (limited support)
- Segment (limited support)
- Hubspot CRM (Stitch, Fivetran)
- Harvest Timesheets (Stitch)
- Xero Accounting (Stitch)
- Stripe Payments (Stitch)
- Asana Projects (Stitch)
- Jira Projects (Stitch)
- Mailchimp Email Marketing (Stitch)
- Segment Events (Segment)
- GCP Billing Exports
- Google Ads (Stitch)
- Facebook Ads (Stitch)
- Intercom Messaging (Stitch)
- Mixpanel Events (Stitch, Fivetran)
- Custom data sources
- Finance (Invoices, Chart of Accounts, Currencies)
- CRM (Deals, Contacts, Companies)
- Projects (Timesheet Projects, Timesheet Tasks, Delivery Projects, Delivery Tasks, Timesheets, Users)
- Marketing (Email lists, Email sends, Email campaigns, Ad Campaigns, Ad Performance, Web Page Views, Web Sessions)
High-priority are:
- Extending Fivetran support to cover all remaining data source types
- Add support for incremental loading of warehouse tables
- Complete test coverage of warehouse and integration layer models
Medium-priority are:
- Extending Segment support to cover all appropriate remaining data sources
- Snowflake as an alternative to Google BigQuery as the warehouse platform
- Enrichment plugins
Issues and pull request from the community are welcomed!
See Setting up a new Warehouse Environment for instructions on how to set-up a new warehouse environment