data-diff fails on tables join step with databricks connector
unidentified-user opened this issue · 3 comments
Description:
I am encountering an issue with the data-diff
command failing on the tables join step when working with Databricks.
It appears that the key column data type is identified as Text(_notes=[])
, which causes the process to fail.
Command:
data-diff -d --dbt --select <path_to_my_model>
.
Output:
NotImplementedError: Cannot use a column of type Text(_notes=[]) as a key
You can see the full log in the attachments.
Environment:
- MacOS Sonoma 14.0
- python 3.9.18
- data-diff 0.9.17
- dbt-databricks 1.5.6
- dbt-core 1.5.9
- databricks-sql-connector 3.0.0
Attachments:
Can you send a screenshot of examples of your primary key column that's being interpreted as text? I tested the below against my databricks sql warehouse and it interpreted it as String_VaryingAlphanum
and worked just fine.
I had to use these dbt versions as the ones you listed didn't play nicely with Databricks SQL warehouses.
data-diff 0.9.17
databricks-sdk 0.9.0
databricks-sql-connector 2.9.3
dbt-core 1.7.4
dbt-databricks 1.7.3
test seed data
org_id,event_timestamp,activity
how,2022-03-29T09:04:05Z,feature A
are,2022-03-29T20:24:28Z,feature A
you,2022-03-31T15:13:42Z,feature A
doing,2022-03-31T23:01:37Z,feature A
today,2022-04-01T15:05:01Z,feature A
14:01:05 DEBUG Running SQL (Databricks): base.py:959
SELECT * FROM (SELECT TRIM(`org_id`), TRIM(`activity`) FROM `hive_metastore`.`prod_sung`.`ref_demo_seed`) AS LIMITED_SELECT LIMIT 64
INFO [Databricks] Schema = {'org_id': String_VaryingAlphanum(_notes=[]), 'event_timestamp': Timestamp(_notes=[], precision=6, rounds=True), 'activity': String_VaryingAlphanum(_notes=[])} schema.py:12
14:01:06 DEBUG Running SQL (Databricks): base.py:959
SELECT * FROM (SELECT TRIM(`org_id`), TRIM(`activity`) FROM `hive_metastore`.`dev_sung`.`ref_demo_seed`) AS LIMITED_SELECT LIMIT 64
INFO [Databricks] Schema = {'org_id': String_VaryingAlphanum(_notes=[]), 'event_timestamp': Timestamp(_notes=[], precision=6, rounds=True), 'activity': String_VaryingAlphanum(_notes=[])}
Hey @sungchun12
Thank you for the response.
I eventually realized the issue was with my key not being alphanumeric, as it contains a colon sign (:
).
See the example below:
Glad you figured it out! Closing this issue then :)