datafold/data-diff

--dbt is stuck with --json flag

harikaduyu opened this issue · 3 comments

Describe the bug
I'm trying to get a json output from a --dbt run which uses a state file. It works fine if there is no --json flag. But when I add the json flag, it gets stuck and process never finishes.

Make sure to include the following (minus sensitive information):

  • The command or code you used

sh data-diff --dbt --state prod-run-artifacts/manifest.json --json -d

  • The run output + error you're getting. (including tracestack)
Running with data-diff=0.11.1
15:44:30 INFO     Parsing file dbt_project.yml                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:287
         INFO     Parsing file /dbt_project/target/manifest.json                                                                                                                                                                                                                                                                                                                                                                             dbt_parser.py:280
         INFO     Parsing file prod-run-artifacts/manifest.json                                                                                                                                                                                                                                                                                                                                                                              dbt_parser.py:280
         INFO     Parsing file target/run_results.json                                                                                                                                                                                                                                                                                                                                                                                       dbt_parser.py:253
         INFO     config: prod_database=None prod_schema=None prod_custom_schema=None datasource_id=None                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:159
         INFO     Parsing file /dbt_project/profiles.yml                                                                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:294
         DEBUG    Found no PKs                                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:465
{"status": "failed", "model": "model.dbt.bi_dagster_asset", "dataset1": ["data-prod", "prod_observability", "bi_dagster_asset"], "dataset2": ["data-prod", "dbt_pr_test_ci_observability", "bi_dagster_asset"], "error": "No primary key found. Add uniqueness tests, meta, or tags.", "version": "1.0.0"}
         DEBUG    Found PKs via Uniqueness tests [fct_tbl_info]: {'col_id'}                                                                                                                                                                                                                                                                                                                                                                  dbt_parser.py:459
         DEBUG    Found PKs via Uniqueness tests [int_table]: {'col_id'}                                                                                                                                                                                                                                                                                                                                                                     dbt_parser.py:459
         DEBUG    Found no PKs                                                                                                                                                                                                                                                                                                                                                                                                               dbt_parser.py:465
{"status": "failed", "model": "model.dbt.dim_latest_email_table", "dataset1": ["data-prod", "prod_schema", "dim_latest_email_table"], "dataset2": ["data-prod", "dbt_pr_test_ci_schema", "dim_latest_email_table"], "error": "No primary key found. Add uniqueness tests, meta, or tags.", "version": "1.0.0"}
         DEBUG    Database 'BigQuery(default_schema='dev', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False), project='data-dev', dataset='dev', _client=<google.cloud.bigquery.client.Client object at 0x10xxxx>)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.                                                                                                     _connect.py:300
         DEBUG    Database 'BigQuery(default_schema='dev', _interactive=False, is_closed=False, _dialect=Dialect(_prevent_overflow_when_concat=False), project='data-dev', dataset='dev', _client=<google.cloud.bigquery.client.Client object at 0x12xxxx>)' does not allow setting timezone. We recommend making sure it's set to 'UTC'.                                                                                                     _connect.py:300
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'prod_schema'                                                                                                                                                                                                              
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'prod_schema'                                                                                                                                                                                                       
15:44:32 DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                  base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`dbt_pr_test_ci_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'dbt_pr_test_ci_schema'                                                                                                                                                                                   
         DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`dbt_pr_test_ci_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'dbt_pr_test_ci_schema'                                                                                                                                                                                          
15:44:33 DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'int_table' AND table_schema = 'prod_schema'                                                                                                                                                                                                       
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT column_name, data_type, 6 as datetime_precision, 38 as numeric_precision, 9 as numeric_scale FROM `data-prod`.`prod_schema`.INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'fct_tbl_info' AND table_schema = 'prod_schema'                                                                                                                                                                                                              
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                             base.py:980
                  SELECT * FROM (SELECT TRIM(`sf_id`), TRIM(`col_name`), TRIM(`col_type`), TRIM(`col_mtd`), TRIM(`col_pl`) FROM `data-prod`.`prod_schema`.`int_table`) AS LIMITED_SELECT LIMIT 64                                                                                                                                                                                                                    
15:44:34 DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                          base.py:980
                  SELECT * FROM (SELECT TRIM(`unit`) FROM `data-prod`.`prod_schema`.`fct_tbl_info`) AS LIMITED_SELECT LIMIT 64                                                                                                                                                                                                                                                                                                                               
 ..... Cut because text gets too long ....                                                                                                                                     
         DEBUG    Done collecting stats for table #2: ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                       joindiff_tables.py:306
         DEBUG    Testing for null keys: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                       joindiff_tables.py:252
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                             base.py:980
                  SELECT `col_id` FROM `data-prod`.`prod_schema`.`int_table` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                              
         DEBUG    Done collecting stats for table #2: ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                   joindiff_tables.py:306
         DEBUG    Testing for null keys: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                joindiff_tables.py:252
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                                         base.py:980
                  SELECT `col_id` FROM `data-prod`.`prod_schema`.`fct_tbl_info` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                                       
15:44:38 DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                                                                   base.py:980
                  SELECT `col_id` FROM `data-prod`.`dbt_pr_test_ci_schema`.`int_table` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                    
         DEBUG    Running SQL (BigQuery): ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT `col_id` FROM `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` WHERE (`col_id` IS NULL)                                                                                                                                                                                                                                                                                                                                             
15:44:39 DEBUG    Counting exclusive rows: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                    joindiff_tables.py:372
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                     base.py:980
                  SELECT count(*) FROM (SELECT * FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`col_value` is distinct from `tmp2`.`col_value` THEN 1 ELSE 0 END AS `is_diff_col_value`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0               
                  END AS `is_diff_org_id`, CASE WHEN `tmp1`.`col_combined_value` is distinct from `tmp2`.`col_combined_value` THEN 1 ELSE 0 END AS `is_diff_col_combined_value`, CASE WHEN `tmp1`.`col_mtd` is distinct from `tmp2`.`col_mtd` THEN 1 ELSE 0 END AS `is_diff_col_mtd`, CASE WHEN `tmp1`.`sf_id` is distinct from `tmp2`.`sf_id` THEN 1 ELSE 0 END AS `is_diff_sf_id`, CASE WHEN                                   
                  `tmp1`.`col_ch_prob` is distinct from `tmp2`.`col_ch_prob` THEN 1 ELSE 0 END AS `is_diff_col_ch_prob`, CASE WHEN `tmp1`.`col_name` is distinct from `tmp2`.`col_name` THEN 1 ELSE 0 END AS `is_diff_col_name`, CASE WHEN `tmp1`.`col_pl` is distinct from `tmp2`.`col_pl` THEN 1 ELSE 0 END AS `is_diff_col_pl`, CASE WHEN `tmp1`.`col_type` is distinct from `tmp2`.`col_type` THEN 1 ELSE 0 END AS                     
                  `is_diff_col_type`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, format('%.11f', `tmp1`.`col_value`) AS `col_value_a`, format('%.11f', `tmp2`.`col_value`) AS `col_value_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, format('%.11f', `tmp1`.`col_combined_value`) AS `col_combined_value_a`,             
                  format('%.11f', `tmp2`.`col_combined_value`) AS `col_combined_value_b`, cast(`tmp1`.`col_mtd` as string) AS `col_mtd_a`, cast(`tmp2`.`col_mtd` as string) AS `col_mtd_b`, cast(`tmp1`.`sf_id` as string) AS `sf_id_a`, cast(`tmp2`.`sf_id` as string) AS `sf_id_b`, format('%.11f', `tmp1`.`col_ch_prob`) AS `col_ch_prob_a`, format('%.11f', `tmp2`.`col_ch_prob`) AS             
                  `col_ch_prob_b`, cast(`tmp1`.`col_name` as string) AS `col_name_a`, cast(`tmp2`.`col_name` as string) AS `col_name_b`, cast(`tmp1`.`col_pl` as string) AS `col_pl_a`, cast(`tmp2`.`col_pl` as string) AS `col_pl_b`, cast(`tmp1`.`col_type` as string) AS `col_type_a`, cast(`tmp2`.`col_type` as string) AS `col_type_b` FROM                                                                                   
                  `data-prod`.`prod_schema`.`int_table` `tmp1` FULL OUTER JOIN `data-prod`.`dbt_pr_test_ci_schema`.`int_table` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_col_value` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_col_combined_value` = 1) OR (`is_diff_col_mtd` = 1) OR                
                  (`is_diff_sf_id` = 1) OR (`is_diff_col_ch_prob` = 1) OR (`is_diff_col_name` = 1) OR (`is_diff_col_pl` = 1) OR (`is_diff_col_type` = 1)) AND (`is_exclusive_a` OR `is_exclusive_b`)) tmp4                                                                                                                                                                                                                                                                       
         DEBUG    Counting exclusive rows: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                              joindiff_tables.py:372
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                               base.py:980
                  SELECT count(*) FROM (SELECT * FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`is_error` is distinct from `tmp2`.`is_error` THEN 1 ELSE 0 END AS                     
                  `is_diff_is_error`, CASE WHEN `tmp1`.`is_inc_error` is distinct from `tmp2`.`is_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_inc_error`, CASE WHEN `tmp1`.`is_non_inc_error` is distinct from `tmp2`.`is_non_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_non_inc_error`, CASE WHEN `tmp1`.`created_at` is distinct from `tmp2`.`created_at` THEN 1 ELSE 0 END AS `is_diff_created_at`, CASE WHEN `tmp1`.`is_x` is distinct from                       
                  `tmp2`.`is_x` THEN 1 ELSE 0 END AS `is_diff_is_x`, CASE WHEN `tmp1`.`is_inc` is distinct from `tmp2`.`is_inc` THEN 1 ELSE 0 END AS `is_diff_is_inc`, CASE WHEN `tmp1`.`is_x_error` is distinct from `tmp2`.`is_x_error` THEN 1 ELSE 0 END AS `is_diff_is_x_error`, CASE WHEN `tmp1`.`unit` is distinct from `tmp2`.`unit` THEN 1 ELSE 0 END AS `is_diff_unit`, CASE WHEN                
                  `tmp1`.`is_missing_t` is distinct from `tmp2`.`is_missing_t` THEN 1 ELSE 0 END AS `is_diff_is_missing_t`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, cast(cast(`tmp1`.`is_error` as int) as string) AS `is_error_a`, cast(cast(`tmp2`.`is_error` as              
                  int) as string) AS `is_error_b`, cast(cast(`tmp1`.`is_inc_error` as int) as string) AS `is_inc_error_a`, cast(cast(`tmp2`.`is_inc_error` as int) as string) AS `is_inc_error_b`, cast(cast(`tmp1`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_a`, cast(cast(`tmp2`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_b`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp1`.`created_at`) AS `created_at_a`,                              
                  FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp2`.`created_at`) AS `created_at_b`, cast(cast(`tmp1`.`is_x` as int) as string) AS `is_x_a`, cast(cast(`tmp2`.`is_x` as int) as string) AS `is_x_b`, cast(cast(`tmp1`.`is_inc` as int) as string) AS `is_inc_a`, cast(cast(`tmp2`.`is_inc` as int) as string) AS `is_inc_b`, cast(cast(`tmp1`.`is_x_error` as int) as string) AS                                      
                  `is_x_error_a`, cast(cast(`tmp2`.`is_x_error` as int) as string) AS `is_x_error_b`, cast(`tmp1`.`unit` as string) AS `unit_a`, cast(`tmp2`.`unit` as string) AS `unit_b`, cast(cast(`tmp1`.`is_missing_t` as int) as string) AS `is_missing_t_a`, cast(cast(`tmp2`.`is_missing_t` as int) as string) AS `is_missing_t_b` FROM                                                    
                  `data-prod`.`prod_schema`.`fct_tbl_info` `tmp1` FULL OUTER JOIN `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_is_error` = 1) OR (`is_diff_is_inc_error` = 1) OR (`is_diff_is_non_inc_error` = 1) OR (`is_diff_created_at` = 1)              
                  OR (`is_diff_is_x` = 1) OR (`is_diff_is_inc` = 1) OR (`is_diff_is_x_error` = 1) OR (`is_diff_unit` = 1) OR (`is_diff_is_missing_t` = 1)) AND (`is_exclusive_a` OR `is_exclusive_b`)) tmp4                                                                                                                                                                                                                                                      
15:44:40 DEBUG    Counting differences per column: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                           joindiff_tables.py:346
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                    base.py:980
                  SELECT sum(`is_diff_col_id`), sum(`is_diff_col_value`), sum(`is_diff_org_id`), sum(`is_diff_col_combined_value`), sum(`is_diff_col_mtd`), sum(`is_diff_sf_id`), sum(`is_diff_col_ch_prob`), sum(`is_diff_col_name`), sum(`is_diff_col_pl`), sum(`is_diff_col_type`) FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS `is_exclusive_b`, CASE WHEN                              
                  `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`col_value` is distinct from `tmp2`.`col_value` THEN 1 ELSE 0 END AS `is_diff_col_value`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`col_combined_value` is distinct from `tmp2`.`col_combined_value` THEN 1 ELSE 0 END AS                         
                  `is_diff_col_combined_value`, CASE WHEN `tmp1`.`col_mtd` is distinct from `tmp2`.`col_mtd` THEN 1 ELSE 0 END AS `is_diff_col_mtd`, CASE WHEN `tmp1`.`sf_id` is distinct from `tmp2`.`sf_id` THEN 1 ELSE 0 END AS `is_diff_sf_id`, CASE WHEN `tmp1`.`col_ch_prob` is distinct from `tmp2`.`col_ch_prob` THEN 1 ELSE 0 END AS `is_diff_col_ch_prob`, CASE WHEN `tmp1`.`col_name` is distinct             
                  from `tmp2`.`col_name` THEN 1 ELSE 0 END AS `is_diff_col_name`, CASE WHEN `tmp1`.`col_pl` is distinct from `tmp2`.`col_pl` THEN 1 ELSE 0 END AS `is_diff_col_pl`, CASE WHEN `tmp1`.`col_type` is distinct from `tmp2`.`col_type` THEN 1 ELSE 0 END AS `is_diff_col_type`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, format('%.11f', `tmp1`.`col_value`) AS            
                  `col_value_a`, format('%.11f', `tmp2`.`col_value`) AS `col_value_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, format('%.11f', `tmp1`.`col_combined_value`) AS `col_combined_value_a`, format('%.11f', `tmp2`.`col_combined_value`) AS `col_combined_value_b`, cast(`tmp1`.`col_mtd` as string) AS `col_mtd_a`,                                     
                  cast(`tmp2`.`col_mtd` as string) AS `col_mtd_b`, cast(`tmp1`.`sf_id` as string) AS `sf_id_a`, cast(`tmp2`.`sf_id` as string) AS `sf_id_b`, format('%.11f', `tmp1`.`col_ch_prob`) AS `col_ch_prob_a`, format('%.11f', `tmp2`.`col_ch_prob`) AS `col_ch_prob_b`, cast(`tmp1`.`col_name` as string) AS `col_name_a`, cast(`tmp2`.`col_name` as string) AS `col_name_b`,                        
                  cast(`tmp1`.`col_pl` as string) AS `col_pl_a`, cast(`tmp2`.`col_pl` as string) AS `col_pl_b`, cast(`tmp1`.`col_type` as string) AS `col_type_a`, cast(`tmp2`.`col_type` as string) AS `col_type_b` FROM `data-prod`.`prod_schema`.`int_table` `tmp1` FULL OUTER JOIN                                                                                                             
                  `data-prod`.`dbt_pr_test_ci_schema`.`int_table` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_col_value` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_col_combined_value` = 1) OR (`is_diff_col_mtd` = 1) OR (`is_diff_sf_id` = 1) OR (`is_diff_col_ch_prob` = 1) OR (`is_diff_col_name` = 1) OR                             
                  (`is_diff_col_pl` = 1) OR (`is_diff_col_type` = 1))                                                                                                                                                                                                                                                                                                                                                                                                                               
         DEBUG    Counting differences per column: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                   joindiff_tables.py:346
         DEBUG    Running SQL (BigQuery): ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                            base.py:980
                  SELECT sum(`is_diff_col_id`), sum(`is_diff_org_id`), sum(`is_diff_is_error`), sum(`is_diff_is_inc_error`), sum(`is_diff_is_non_inc_error`), sum(`is_diff_created_at`), sum(`is_diff_is_x`), sum(`is_diff_is_inc`), sum(`is_diff_is_x_error`), sum(`is_diff_unit`), sum(`is_diff_is_missing_t`) FROM (SELECT (`tmp2`.`col_id` IS NULL) AS `is_exclusive_a`, (`tmp1`.`col_id` IS NULL) AS                            
                  `is_exclusive_b`, CASE WHEN `tmp1`.`col_id` is distinct from `tmp2`.`col_id` THEN 1 ELSE 0 END AS `is_diff_col_id`, CASE WHEN `tmp1`.`org_id` is distinct from `tmp2`.`org_id` THEN 1 ELSE 0 END AS `is_diff_org_id`, CASE WHEN `tmp1`.`is_error` is distinct from `tmp2`.`is_error` THEN 1 ELSE 0 END AS `is_diff_is_error`, CASE WHEN `tmp1`.`is_inc_error` is distinct from `tmp2`.`is_inc_error` THEN 1 ELSE 0 END AS                         
                  `is_diff_is_inc_error`, CASE WHEN `tmp1`.`is_non_inc_error` is distinct from `tmp2`.`is_non_inc_error` THEN 1 ELSE 0 END AS `is_diff_is_non_inc_error`, CASE WHEN `tmp1`.`created_at` is distinct from `tmp2`.`created_at` THEN 1 ELSE 0 END AS `is_diff_created_at`, CASE WHEN `tmp1`.`is_x` is distinct from `tmp2`.`is_x` THEN 1 ELSE 0 END AS `is_diff_is_x`, CASE WHEN `tmp1`.`is_inc` is distinct from                    
                  `tmp2`.`is_inc` THEN 1 ELSE 0 END AS `is_diff_is_inc`, CASE WHEN `tmp1`.`is_x_error` is distinct from `tmp2`.`is_x_error` THEN 1 ELSE 0 END AS `is_diff_is_x_error`, CASE WHEN `tmp1`.`unit` is distinct from `tmp2`.`unit` THEN 1 ELSE 0 END AS `is_diff_unit`, CASE WHEN `tmp1`.`is_missing_t` is distinct from `tmp2`.`is_missing_t` THEN 1 ELSE 0 END AS                                       
                  `is_diff_is_missing_t`, cast(`tmp1`.`col_id` as string) AS `col_id_a`, cast(`tmp2`.`col_id` as string) AS `col_id_b`, cast(`tmp1`.`org_id` as string) AS `org_id_a`, cast(`tmp2`.`org_id` as string) AS `org_id_b`, cast(cast(`tmp1`.`is_error` as int) as string) AS `is_error_a`, cast(cast(`tmp2`.`is_error` as int) as string) AS `is_error_b`, cast(cast(`tmp1`.`is_inc_error` as int) as string) AS                        
                  `is_inc_error_a`, cast(cast(`tmp2`.`is_inc_error` as int) as string) AS `is_inc_error_b`, cast(cast(`tmp1`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_a`, cast(cast(`tmp2`.`is_non_inc_error` as int) as string) AS `is_non_inc_error_b`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp1`.`created_at`) AS `created_at_a`, FORMAT_TIMESTAMP('%F %H:%M:%E6S', `tmp2`.`created_at`) AS `created_at_b`,                                                
                  cast(cast(`tmp1`.`is_x` as int) as string) AS `is_x_a`, cast(cast(`tmp2`.`is_x` as int) as string) AS `is_x_b`, cast(cast(`tmp1`.`is_inc` as int) as string) AS `is_inc_a`, cast(cast(`tmp2`.`is_inc` as int) as string) AS `is_inc_b`, cast(cast(`tmp1`.`is_x_error` as int) as string) AS `is_x_error_a`, cast(cast(`tmp2`.`is_x_error` as int) as string) AS                  
                  `is_x_error_b`, cast(`tmp1`.`unit` as string) AS `unit_a`, cast(`tmp2`.`unit` as string) AS `unit_b`, cast(cast(`tmp1`.`is_missing_t` as int) as string) AS `is_missing_t_a`, cast(cast(`tmp2`.`is_missing_t` as int) as string) AS `is_missing_t_b` FROM `data-prod`.`prod_schema`.`fct_tbl_info` `tmp1` FULL OUTER JOIN                                     
                  `data-prod`.`dbt_pr_test_ci_schema`.`fct_tbl_info` `tmp2` ON (`tmp1`.`col_id` = `tmp2`.`col_id`)) tmp3 WHERE ((`is_diff_col_id` = 1) OR (`is_diff_org_id` = 1) OR (`is_diff_is_error` = 1) OR (`is_diff_is_inc_error` = 1) OR (`is_diff_is_non_inc_error` = 1) OR (`is_diff_created_at` = 1) OR (`is_diff_is_x` = 1) OR (`is_diff_is_inc` = 1) OR (`is_diff_is_x_error` = 1)            
                  OR (`is_diff_unit` = 1) OR (`is_diff_is_missing_t` = 1))                                                                                                                                                                                                                                                                                                                                                                                                                      
15:44:41 INFO     Diffing complete: ('data-prod', 'prod_schema', 'int_table') <> ('data-prod', 'dbt_pr_test_ci_schema', 'int_table')                                                                                                                                                                                                                                                                                                      joindiff_tables.py:165
{"status": "success", "result": "identical", "model": "model.dbt.int_table", "dataset1": ["data-prod", "prod_schema", "int_table"], "dataset2": ["data-prod", "dbt_pr_test_ci_schema", "int_table"], "rows": {"exclusive": {"dataset1": [], "dataset2": []}, "diff": []}, "summary": {"rows": {"total": {"dataset1": 27346, "dataset2": 27346}, 
"exclusive": {"dataset1": 0, "dataset2": 0}, "updated": 0, "unchanged": 27346}, "stats": {"diffCounts": {"col_value": 0, "org_id": 0, "col_combined_value": 0, "col_mtd": 0, "sf_id": 0, "col_ch_prob": 0, "col_name": 0, "col_pl": 0, "col_type": 0}}}, "columns": {"dataset1": [{"name": "col_id", "type": "INT64", "kind": "integer"}, {"name": "org_id", "type": "INT64", "kind": "integer"}, {"name": "sf_id", "type": 
"STRING", "kind": "unsupported"}, {"name": "col_name", "type": "STRING", "kind": "unsupported"}, {"name": "col_type", "type": "STRING", "kind": "unsupported"}, {"name": "col_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_combined_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_mtd", "type": "STRING", "kind": "unsupported"}, {"name": "col_pl", "type": "STRING", "kind": "unsupported"}, {"name": "col_ch_prob", "type": "FLOAT64", 
"kind": "float"}], "dataset2": [{"name": "col_id", "type": "INT64", "kind": "integer"}, {"name": "org_id", "type": "INT64", "kind": "integer"}, {"name": "sf_id", "type": "STRING", "kind": "unsupported"}, {"name": "col_name", "type": "STRING", "kind": "unsupported"}, {"name": "col_type", "type": "STRING", "kind": "unsupported"}, {"name": "col_value", "type": "FLOAT64", "kind": "float"}, {"name": "col_combined_value", "type": "FLOAT64", "kind": "float"}, 
{"name": "col_mtd", "type": "STRING", "kind": "unsupported"}, {"name": "col_pl", "type": "STRING", "kind": "unsupported"}, {"name": "col_ch_prob", "type": "FLOAT64", "kind": "float"}], "primaryKey": ["col_id"], "exclusive": {"dataset1": [], "dataset2": []}, "typeChanged": []}, "version": "1.1.0"}
15:45:22 INFO     Diffing complete: ('data-prod', 'prod_schema', 'fct_tbl_info') <> ('data-prod', 'dbt_pr_test_ci_schema', 'fct_tbl_info')                                                                                                                                                                                                                                                                                               joindiff_tables.py:165
⠼ 
In Progress fct_tbl_info
In Progress int_table
Diffing complete: ('data-prod', 'prod_schema', 'fct_tbl

The last line is also not fully shown. It's cut before even the table name.

Describe the environment

I'm using macOS 14.3.1

I have exactly the same issue. Without --json its super fast, with --json it seems to be running for ever.

This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

Hi @harikaduyu,

I'm sorry for the delay in responding. Thank you for trying out data-diff and for opening this issue!

We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb