Python API diff_tables() throws duckdb.BinderException
cmcnicoll opened this issue · 1 comments
cmcnicoll commented
Describe the bug
Getting an error when trying the following example using DuckDB:
table1 = connect_to_table('postgresql:///', 'Rating', 'id')
list(diff_tables(table1, table1))
[]
Code:
import duckdb # 0.10.2
from data_diff import connect_to_table, diff_tables # 0.11.1
with duckdb.connect("test.duckdb") as con:
con.sql("drop table if exists test_table")
con.sql(
"create table test_table as select * from read_csv('test/*.csv', header = true)"
)
con.sql("show all tables").show()
con.table("test_table").show()
test_table = connect_to_table(
"duckdb://test.duckdb", "test.main.test_table", ("test_id")
)
print(test_table, "\n")
list(diff_tables(test_table, test_table))
Output:
$ py test_data_diff.py
┌──────────┬─────────┬────────────┬───────────────────────┬───────────────────┬───────────┐
│ database │ schema │ name │ column_names │ column_types │ temporary │
│ varchar │ varchar │ varchar │ varchar[] │ varchar[] │ boolean │
├──────────┼─────────┼────────────┼───────────────────────┼───────────────────┼───────────┤
│ test │ main │ test_table │ [test_id, test_value] │ [BIGINT, VARCHAR] │ false │
└──────────┴─────────┴────────────┴───────────────────────┴───────────────────┴───────────┘
┌─────────┬────────────┐
│ test_id │ test_value │
│ int64 │ varchar │
├─────────┼────────────┤
│ 1 │ a │
│ 2 │ b │
│ 3 │ c │
└─────────┴────────────┘
TableSegment(database=DuckDB(default_schema='main', _interactive=False, is_closed=False,
_dialect=Dialect(_prevent_overflow_when_concat=False), _args={'filepath': ''},
_conn=<duckdb.duckdb.DuckDBPyConnection object at 0x000001AD984E6C30>), table_path=('test', 'main', 'test_table'),
key_columns=('test_id',), update_column=None, extra_columns=(), ignored_columns=frozenset(), min_key=None, max_key=None,
min_update=None, max_update=None, where=None, case_sensitive=True, _schema=None)
Traceback (most recent call last):
File "C:\code\duckdb\data-diff\test_data_diff.py", line 17, in <module>
list(diff_tables(test_table, test_table))
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 95, in __iter__
for i in self.diff:
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 266, in _diff_tables_wrapper
raise error
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 236, in _diff_tables_wrapper
table1, table2 = self._threaded_call("with_schema", [table1, table2])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\diff_tables.py", line 51, in _threaded_call
return list(self._thread_map(methodcaller(func), iterable))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 619, in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 317, in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result
raise self._exception
File "C:\Users\cmcni\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\table_segment.py", line 153, in with_schema
return self._with_raw_schema(self.database.query_table_schema(self.table_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1048, in query_table_schema
rows = self.query(self.select_table_schema(path), list, log_message=path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 996, in query
res = self._query(sql_code)
^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\duckdb.py", line 141, in _query
return self._query_conn(self._conn, sql_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1188, in _query_conn
return apply_query(callback, sql_code)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 211, in apply_query
return callback(sql_code)
^^^^^^^^^^^^^^^^^^
File "C:\code\duckdb\.venv\Lib\site-packages\data_diff\databases\base.py", line 1173, in _query_cursor
c.execute(sql_code)
duckdb.duckdb.BinderException: Binder Error: Catalog "test" does not exist!
Describe the environment
Windows 11 Pro
Python 3.12.2
data_diff 0.11.1
duckdb 0.10.2
glebmezh commented
Hi @cmcnicoll,
Thank you for trying out data-diff and for taking the time to open this issue. We made a hard decision to sunset the data-diff package and won't provide further development or support. Diffing functionality will continue to be available in Datafold Cloud. However, DuckDB connector is not yet supported in the cloud (on the roadmap).
Feel free to contact us at support@datafold.com if you have any questions.
-Gleb