Mock test environment doesn't test Trino/Iceberg limitations
MichaelTiemannOSC opened this issue · 2 comments
MichaelTiemannOSC commented
# This passes Mock test, but fails when used in Trino/Iceberg environment
def test_trino_pandas_insert():
import pandas as pd
# mock up an sqlalchemy table
tbl = mock.MagicMock()
tbl.name = "test_pandas"
# mock up an sqlalchemy Connnection
cxn = mock.MagicMock()
df = pd.DataFrame(
{"A": [4.5], "B'C": [math.nan], None: [math.inf], "D": [-math.inf], "E": [datetime(2022, 1, 1)], ":F": [1.0]}
).convert_dtypes()
assert (df.dtypes == ['Float64', 'Int64', 'Float64', 'Float64', 'datetime64[ns]', 'Int64']).all()
df.to_sql(
tbl.name,
con=cxn,
schema="test",
if_exists="append",
index=False,
method=TrinoBatchInsert(batch_size=5, verbose=True),
)
The failure is that datetime64[ns] is translated to timestamp(3) by Pandas, but Iceberg wants timestamp(6). I have tried to figure out the correct dtype argument to pass to to_sql, but everything I've tried passing so far resolves to timestamp(3) inside of Pandas. I need to figure out how to pass it something that will resolve to timestamp(6).
MichaelTiemannOSC commented
Some hacking required: trinodb/trino-python-client#107
erikerlandson commented
Will trinodb/trino-python-client#181 resolve?