datafold/data-diff

Key compare fails between VARCHAR2(40) (Oracle) and STRING (BigQuery) datatypes

mhe2024 opened this issue · 1 comments

Trying to do a cross-database compare between Oracle and BigQuery.
The key column in Oracle is of type VARCHAR2(40). In BigQuery it is STRING.
The keys are mostly UUID, but also some test records with integers as keys.
In the log I see the following entries:

DEFAULT 2024-05-15T11:39:57.627042Z 2024-05-15 11:39:57,628 - Got the following input request: {'a': {'system': 'data-platform', 'schema': 'access_internal_v3', 'table': 'dt_alg_diagnose_grp', 'pks': ['alg_diagnose_grp_id']}, 'b': {'system': 'dwh', 'schema': 'DM_MIG', 'table': 'DT_ALG_DIAGNOSE_GRP', 'pks': ['ALG_DIAGNOSE_GRP_ID']}}
DEFAULT 2024-05-15T11:39:58.847835Z 2024-05-15 11:39:58,849 - Mixed UUID/Non-UUID values detected in column access_internal_v3.dt_alg_diagnose_grp.alg_diagnose_grp_id, disabling UUID support.
DEFAULT 2024-05-15T11:39:58.848034Z 2024-05-15 11:39:58,849 - [BigQuery] Schema = {'alg_diagnose_grp_id': String_VaryingAlphanum(_notes=[], collation=None)}
DEFAULT 2024-05-15T11:39:58.938690Z 2024-05-15 11:39:58,940 - [Oracle] Schema = {'ALG_DIAGNOSE_GRP_ID': String_UUID(_notes=[], collation=None, lowercase=True, uppercase=False)}
DEFAULT 2024-05-15T11:39:58.940064Z 2024-05-15 11:39:58,940 - Exception on / [POST]
[...]
DEFAULT 2024-05-15T11:40:00.015535Z data_diff.errors.DataDiffMismatchingKeyTypesError: Key columns alg_diagnose_grp_id and ALG_DIAGNOSE_GRP_ID can't be compared due to different types.

Is it possible to override the Oracle schema in data-diff?
We are using the latest version 0.11.1

Hi @mhe2024,

Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. We have completely rewritten the diffing engine in the cloud over the past few months and have solved the fundamental issues with the original algorithm used in the data-diff package. Feel free to take it for a trial or contact us at support@datafold.com if you have any questions.

-Gleb