datediff -> DATE_DIFF
andrewhong5297 opened this issue · 6 comments
Seeing an issue where for some reason translation of date diff isn't quite correct. Giving the trino error: USER_ERROR line 6:76: Column 'day' cannot be resolved
Postgres/Spark input:
WITH date_range AS ( SELECT date(min(block_time)) + interval '1' day * seq AS date FROM ethereum.logs, UNNEST(sequence(0, datediff(day, min(block_time), max(block_time)))) AS t(seq) WHERE contract_address = 0x81f8f0bb1cB2A06649E51913A151F0E7Ef6FA321 ), daily_transactions AS ( SELECT date(block_time) AS date, count(*) AS num_transactions FROM ethereum.logs WHERE contract_address = 0x81f8f0bb1cB2A06649E51913A151F0E7Ef6FA321 GROUP BY date(block_time) ) SELECT dr.date, COALESCE(dt.num_transactions, 0) AS num_transactions FROM date_range dr LEFT JOIN daily_transactions dt ON dr.date = dt.date ORDER BY dr.date;
Trino:
/* Success! If you're still running into issues, check out https://dune.com/docs/query/syntax-differences/ or reach out in the #dune-sql Discord channel. */
WITH date_range AS ( SELECT DATE(MIN(block_time)) + INTERVAL '1' day * seq AS date FROM ethereum.logs, UNNEST(SEQUENCE(0, DATE_DIFF('day', MIN(block_time), day)) /* WARNING: Check out the docs for example of time series generation: https://dune.com/docs/query/syntax-differences/ */) AS t(seq) WHERE contract_address = 0x81f8f0bb1cb2a06649e51913a151f0e7ef6fa321 ), daily_transactions AS ( SELECT DATE(block_time) AS date, COUNT(*) AS num_transactions FROM ethereum.logs WHERE contract_address = 0x81f8f0bb1cb2a06649e51913a151f0e7ef6fa321 GROUP BY DATE(block_time) ) SELECT dr.date, COALESCE(dt.num_transactions, 0) AS num_transactions FROM date_range AS dr LEFT JOIN daily_transactions AS dt ON dr.date = dt.date ORDER BY dr.date
Great, thanks! It seems like it's swapping the order of the arguments and not quoting them correctly:
>>> sqlglot.transpile("select datediff('unit', from_, to)", read="postgres", write="trino")
["SELECT DATE_DIFF('to', from_, 'unit')"]
Hey 👋
I don't think that Postgres SQL supports the DATEDIFF
function. Maybe this is a UDF? Also, Spark's documentation indicates that its DATEDIFF
function accepts two arguments.
Are these queries valid, i.e. can they run on the corresponding engines? Happy to help if this is a SQLGlot issue!
Interestingly, spark-sql
seems to actually support the unit
argument:
spark-sql> SELECT DATEDIFF(day, '2020-01-01', '2020-01-05');
timestampdiff(day, 2020-01-01, 2020-01-05)
4
Time taken: 0.206 seconds, Fetched 1 row(s)
spark-sql> SELECT DATEDIFF(day, '2020-01-01', '2020-02-05');
timestampdiff(day, 2020-01-01, 2020-02-05)
35
Time taken: 0.168 seconds, Fetched 1 row(s)
spark-sql> SELECT DATEDIFF(month, '2020-01-01', '2020-02-05');
timestampdiff(month, 2020-01-01, 2020-02-05)
1
Time taken: 0.163 seconds, Fetched 1 row(s)
So this might be a documentation issue.. Verified that we get back a "function does not exist" error for postgres, though.
Ahh, it very well might be that it's a UDF! We have a fair amount of UDFs on the Postgres clusters. Will check. Thanks!
Confirmed that we don't actually have this in Postgres, so this would only be for Spark.
Interesting that it doesn't exist in the documentation!
Cool, since it runs on spark-sql
& it's also supported by databricks, I think I'll just add it in Hive so that they're all covered. Thanks for checking!