Bug: `FromUnixtimeOperatorTransformer` possibly built on false supposisions.
findinpath opened this issue · 2 comments
The FromUnixtimeOperatorTransformer
from https://github.com/linkedin/coral/pull/426/files#diff-e26e325f65fb1f3341468a92d2e1051ee2b8b94b56c74203054aca3b2c790e4e
seems to be built on false premises.
The function https://trino.io/docs/current/functions/datetime.html#from_unixtime returns a timestamp(3) with time zone
However, the FromUnixtimeOperatorTransformer
transforms from_unixtime
expressions to
FORMAT_DATETIME(FROM_UNIXTIME(10000), 'yyyy-MM-dd HH:mm:ss')
which is a varchar
This is causing the failure of statements like the following:
select
CAST(
CAST(
"at_timezone" (
"format_datetime" (
"from_unixtime" (
CAST(
"test_from_utc_timestamp_source"."source_float" AS DOUBLE
)
),
'yyyy-MM-dd HH:mm:ss'
),
"$canonicalize_hive_timezone_id" ('America/Los_Angeles')
) AS TIMESTAMP (3)
) AS VARCHAR (65535)
) AS "ts_float"
from test_from_utc_timestamp_source
with the exception:
Caused by: io.trino.spi.TrinoException: Unexpected parameters (varchar, varchar) for function at_timezone. Expected: at_timezone(timestamp(p) with time zone, interval day to second), at_timezone(timestamp(p) with time zone, varchar(x))
Related PR: #426
Steps to reproduce the issue
hive
CREATE TABLE test_from_utc_timestamp_source (source_float float);
CREATE VIEW test_from_utc_timestamp_view AS SELECT CAST(from_utc_timestamp(source_float, 'America/Los_Angeles') AS STRING) ts_float FROM test_from_utc_timestamp_source;
select * from test_from_utc_timestamp_view;
-- 1970-01-30 21:30:00
trino
insert into hive.default.test_from_utc_timestamp_source values (2592000.0);
trino version 420 while using coral version 2.2.9
SELECT * FROM hive.default.test_from_utc_timestamp_view;
Query 20231013_184405_00000_fa5un failed: line 1:15: Failed analyzing stored view 'hive.default.test_from_utc_timestamp_view': line 1:18: Unexpected parameters (varchar, varchar) for function at_timezone. Expected: at_timezone(timestamp(p) with time zone, varchar(x)), at_timezone(timestamp(p) with time zone, interval day to second)
trinoSql
in ViewReaderUtil
SELECT CAST(CAST("at_timezone"("format_datetime"("from_unixtime"(CAST("test_from_utc_timestamp_source"."source_float" AS DOUBLE)), 'yyyy-MM-dd HH:mm:ss'), "$canonicalize_hive_timezone_id"('America/Los_Angeles')) AS TIMESTAMP(3)) AS VARCHAR(65535)) AS "ts_float"
FROM "default"."test_from_utc_timestamp_source" AS "test_from_utc_timestamp_source"
trino version 420
while using coral version 2.1.5
trino> SELECT * FROM hive.default.test_from_utc_timestamp_view;
ts_float
-------------------------
1970-01-30 16:00:00.000
trinoSql in ViewReaderUtil
SELECT CAST(CAST("at_timezone"("from_unixtime"(CAST("test_from_utc_timestamp_source"."source_float" AS DOUBLE)), "$canonicalize_hive_timezone_id"('America/Los_Angeles')) AS TIMESTAMP(3)) AS VARCHAR(65535)) AS "ts_float"
FROM "default"."test_from_utc_timestamp_source" AS "test_from_utc_timestamp_source"
2.1.5
CAST(CAST("at_timezone"("from_unixtime"...
2.2.9
CAST(CAST("at_timezone"("format_datetime"("from_unixtime"
format_datetime
shouldn't be there.
FromUtcTimestampOperatorTransformer is playing with both:
- from_unixtime_nanos
- from_unixtime
from_unixtime_nanos
is a Trino specific function https://trino.io/docs/current/functions/datetime.html?highlight=from_unixtime_nanos#from_unixtime_nanos
Translation of from_unixtime_nanos works as expected.
from_unixtime
is:
- a Trino specific function https://trino.io/docs/current/functions/datetime.html?highlight=from_unixtime#from_unixtime returning timestamp(3) with time zone#
- a Hive specific function https://cwiki.apache.org/confluence/display/hive/languagemanual+udf returning string type
string
from_unixtime(bigint unixtime[, string pattern])
Converts a number of seconds since epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current time zone(using config "hive.local.time.zone") using the specified pattern. If the pattern is missing the default is used ('uuuu-MM-dd HH:mm:ss' or yyyy-MM-dd HH:mm:ss'). Example: from_unixtime(0)=1970-01-01 00:00:00 (hive.local.time.zone=Etc/GMT)
Given the fact that from_unixtime
is a Hive function, it gets its own FromUnixtimeOperatorTransformer
and that's where our problem actually occurs.
When we transform from_utc_timestamp
via FromUtcTimestampOperatorTransformer
we're creating (in case of dealing with floats) a nested from_unixtime
call which gets unintentionally transformed as well to "at_timezone"("from_unixtime" .