Snowflake and Postgres dialects normalize dates differently than other database types
RenanBasilio opened this issue · 1 comments
Describe the bug
In the Snowflake and Postgres database dialects date columns are normalized differently than in other dialects. This causes a false positive diff when comparing those dates with date columns in other database types.
For example, consider the comparison between MySQL and Snowflake:
This is what the schemas look like (from logging with --verbose):
INFO [MySQL] Schema = {
'id': Integer(_notes=[], precision=0, python_type=<class 'int'>),
'some_date': Date(_notes=[], precision=6, rounds=True)}
INFO [Snowflake] Schema = {
'ID': Decimal(_notes=[], precision=0),
'SOME_DATE': Date(_notes=[], precision=6, rounds=False)}
This query is used when pulling a date column from the MySQL database:
SELECT
cast(`id` as char),
cast(cast( cast(`some_date` as datetime(6)) as datetime(6)) as char)
FROM `some_schema`.`some_table`
This returns a pair like
1, 2024-03-28 00:00:00.000000
In turn, in Snowflake the query is
SELECT
cast(cast("ID" as decimal(38, 0)) as string),
("SOME_DATE"::varchar)
FROM "SOME_SCHEMA"."SOME_TABLE"
which returns a row like
1, 2024-03-28
This is then detected as a mismatch.
This seems to stem from the normalize_timestamp
function in the database dialect. In the BaseDialect, it is documented that the function should return a date formatted as YYYY-MM-DD HH:mm:SS.FFFFFF
, which is accurate to the MySQL result but not to the Snowflake or Postgres dialects which only cast it directly as a string.
data-diff/data_diff/databases/mysql.py
Lines 108 to 113 in b4e2d4b
data-diff/data_diff/databases/postgresql.py
Lines 113 to 125 in b4e2d4b
Describe the environment
Data Diff Open Source v0.11.1
Python 3.9.12
MacOSX Sonoma 14.2.1 (23C71)
Hi @RenanBasilio,
Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. We have completely rewritten the diffing engine in the cloud over the past few months and have solved the fundamental issues with the original algorithm used in the data-diff package. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.
-Gleb