Joindiff on the same table failng - Clickhouse
hamer101 opened this issue · 4 comments
Comparing a table with itself with joindiff in Clickhouse with the command:
data-diff "clickhouse://clickhouse:Password1@localhost:9000/clickhouse" student student -k id -c deathdate -v
Results in:
10:31:05 INFO [Clickhouse] Starting a threadpool, size=1. base.py:1196
INFO [Clickhouse] Schema = {'id': ('id', 'UInt32', None, 32, 0), 'first_name': ('first_name', 'String', None, None, None), 'last_name': ('last_name', 'String', None, None, None), 'email': ('email', 'String', None, None, None), 'gender': schema.py:12
('gender', 'String', None, None, None), 'birthdate': ('birthdate', 'Date', 0, None, None), 'deathdate': ('deathdate', 'Nullable(Date)', 0, None, None)}
INFO [Clickhouse] Schema = {'id': ('id', 'UInt32', None, 32, 0), 'first_name': ('first_name', 'String', None, None, None), 'last_name': ('last_name', 'String', None, None, None), 'email': ('email', 'String', None, None, None), 'gender': schema.py:12
('gender', 'String', None, None, None), 'birthdate': ('birthdate', 'Date', 0, None, None), 'deathdate': ('deathdate', 'Nullable(Date)', 0, None, None)}
INFO Diffing schemas... __main__.py:81
INFO Diffing using columns: key=('id',) update=None extra=('deathdate',). __main__.py:518
INFO Using algorithm 'joindiff'. __main__.py:519
INFO [Clickhouse] Schema = {'id': Integer(_notes=[], precision=0, python_type=<class 'int'>), 'deathdate': UnknownColType(_notes=[], text='Nullable(Date)')} schema.py:12
INFO [Clickhouse] Schema = {'id': Integer(_notes=[], precision=0, python_type=<class 'int'>), 'deathdate': UnknownColType(_notes=[], text='Nullable(Date)')} schema.py:12
INFO Validating that the are no duplicate keys in columns: ['id'] joindiff_tables.py:243
INFO Validating that the are no duplicate keys in columns: ['id'] joindiff_tables.py:243
INFO Diffing segments at key-range: (1)..(1001). size: table1 <= 1000, table2 <= 1000 diff_tables.py:294
INFO . Diffing segment 1/32, key-range: (1)..(32), size <= None joindiff_tables.py:183
INFO . Diffing segment 2/32, key-range: (32)..(63), size <= None joindiff_tables.py:183
INFO . Diffing segment 3/32, key-range: (63)..(94), size <= None joindiff_tables.py:183
INFO . Diffing segment 4/32, key-range: (94)..(125), size <= None joindiff_tables.py:183
INFO . Diffing segment 5/32, key-range: (125)..(156), size <= None joindiff_tables.py:183
INFO . Diffing segment 6/32, key-range: (156)..(187), size <= None joindiff_tables.py:183
INFO . Diffing segment 7/32, key-range: (187)..(218), size <= None joindiff_tables.py:183
INFO . Diffing segment 8/32, key-range: (218)..(249), size <= None joindiff_tables.py:183
INFO . Diffing segment 9/32, key-range: (249)..(280), size <= None joindiff_tables.py:183
INFO . Diffing segment 10/32, key-range: (280)..(311), size <= None joindiff_tables.py:183
INFO . Diffing segment 11/32, key-range: (311)..(342), size <= None joindiff_tables.py:183
INFO . Diffing segment 12/32, key-range: (342)..(373), size <= None joindiff_tables.py:183
INFO . Diffing segment 13/32, key-range: (373)..(404), size <= None joindiff_tables.py:183
INFO . Diffing segment 14/32, key-range: (404)..(435), size <= None joindiff_tables.py:183
INFO . Diffing segment 15/32, key-range: (435)..(466), size <= None joindiff_tables.py:183
INFO . Diffing segment 16/32, key-range: (466)..(497), size <= None joindiff_tables.py:183
INFO . Diffing segment 17/32, key-range: (497)..(528), size <= None joindiff_tables.py:183
INFO . Diffing segment 18/32, key-range: (528)..(559), size <= None joindiff_tables.py:183
INFO . Diffing segment 19/32, key-range: (559)..(590), size <= None joindiff_tables.py:183
INFO . Diffing segment 20/32, key-range: (590)..(621), size <= None joindiff_tables.py:183
INFO . Diffing segment 21/32, key-range: (621)..(652), size <= None joindiff_tables.py:183
INFO . Diffing segment 22/32, key-range: (652)..(683), size <= None joindiff_tables.py:183
INFO . Diffing segment 23/32, key-range: (683)..(714), size <= None joindiff_tables.py:183
INFO . Diffing segment 24/32, key-range: (714)..(745), size <= None joindiff_tables.py:183
INFO . Diffing segment 25/32, key-range: (745)..(776), size <= None joindiff_tables.py:183
INFO . Diffing segment 26/32, key-range: (776)..(807), size <= None joindiff_tables.py:183
INFO . Diffing segment 27/32, key-range: (807)..(838), size <= None joindiff_tables.py:183
INFO . Diffing segment 28/32, key-range: (838)..(869), size <= None joindiff_tables.py:183
INFO . Diffing segment 29/32, key-range: (869)..(900), size <= None joindiff_tables.py:183
INFO . Diffing segment 30/32, key-range: (900)..(931), size <= None joindiff_tables.py:183
INFO . Diffing segment 31/32, key-range: (931)..(962), size <= None joindiff_tables.py:183
INFO . Diffing segment 32/32, key-range: (962)..(1001), size <= None
10:31:06 ERROR Code: 62. __main__.py:348
DB::Exception: Syntax error: failed at position 134 ('distinct'): distinct from "tmp2"."id" THEN 1 ELSE 0 END AS "is_diff_id", CASE WHEN "tmp1"."deathdate" is distinct from "tmp2"."deathdate" THEN 1 ELSE 0 END AS "is_diff_deat. Expected
one of: NOT DISTINCT FROM, NULL, NOT NULL, THEN, OR, AND, IS NOT DISTINCT FROM, IS NULL, IS NOT NULL, BETWEEN, NOT BETWEEN, LIKE, ILIKE, NOT LIKE, NOT ILIKE, REGEXP, IN, NOT IN, GLOBAL IN, GLOBAL NOT IN, MOD, DIV. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000cdcf077 in /usr/bin/clickhouse
1. DB::Exception::createDeprecated(String const&, int, bool) @ 0x00000000079b6bad in /usr/bin/clickhouse
2. DB::parseQueryAndMovePosition(DB::IParser&, char const*&, char const*, String const&, bool, unsigned long, unsigned long) @ 0x00000000142afedc in /usr/bin/clickhouse
3. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x0000000012bdd397 in /usr/bin/clickhouse
4. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x0000000012bdc65e in /usr/bin/clickhouse
5. DB::TCPHandler::runImpl() @ 0x0000000013aa16a9 in /usr/bin/clickhouse
6. DB::TCPHandler::run() @ 0x0000000013ab6059 in /usr/bin/clickhouse
7. Poco::Net::TCPServerConnection::start() @ 0x000000001658e334 in /usr/bin/clickhouse
8. Poco::Net::TCPServerDispatcher::run() @ 0x000000001658f531 in /usr/bin/clickhouse
9. Poco::PooledThread::run() @ 0x000000001669ac47 in /usr/bin/clickhouse
10. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000016698edc in /usr/bin/clickhouse
11. ? @ 0x00007ff1583b6609 in ?
12. ? @ 0x00007ff1582db133 in ?
I attach the full trace here.
I'm running data-diff version 0.9.17.
have the same issue, cant diff tables in Clickhouse((
This construction not work in CH:
...
CASE WHEN "tmp1"."id" is distinct from "tmp2"."id" THEN 1 ELSE 0 END AS "is_diff_id",
...
Is thereis exist way to handle this?
Currently seems just need add to clickhouse dialect (clickhouse.py)
def is_distinct_from(self, a: str, b: str) -> str:
return f"(({a} IS NULL) != ({b} IS NULL)) OR ({a}!={b})"
This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.