/data-diff

Compare tables within or across databases

Primary LanguagePythonMIT LicenseMIT

Datafold

data-diff

Develop dbt models faster by testing as you code.

See how every change to dbt code affects the data produced in the modified model and downstream.


What is data-diff?

data-diff is an open source package that you can use to see the impact of your dbt code changes on your dbt models as you code.

development_testing_gif


👀 Watch 4-min demo video here

Getting Started

Install data-diff

Install data-diff with the command that is specific to the database you use with dbt.

Snowflake

pip install data-diff 'data-diff[snowflake,dbt]' -U

BigQuery

pip install data-diff 'data-diff[dbt]' google-cloud-bigquery -U

Redshift

pip install data-diff 'data-diff[redshift,dbt]' -U

Postgres

pip install data-diff 'data-diff[postgres,dbt]' -U

Databricks

pip install data-diff 'data-diff[databricks,dbt]' -U

DuckDB

pip install data-diff 'data-diff[duckdb,dbt]' -U

Update a few lines in your dbt_project.yml.

#dbt_project.yml
vars:
  data_diff:
    prod_database: my_database
    prod_schema: my_default_schema

Run your first data diff!

dbt run && data-diff --dbt

We recommend you get started by walking through our simple setup instructions which contain examples and details.

Please reach out on the dbt Slack in #tools-datafold if you have any trouble whatsoever getting started!



Diffing between databases

Check out our documentation if you're looking to compare data across databases (for example, between Postgres and Snowflake).


Contributors

We thank everyone who contributed so far!


Analytics


License

This project is licensed under the terms of the MIT License.