os-climate/osc-ingest-tools

Mock test environment doesn't test Trino/Iceberg limitations

MichaelTiemannOSC opened this issue · 2 comments

 # This passes Mock test, but fails when used in Trino/Iceberg environment

def test_trino_pandas_insert():
    import pandas as pd
    # mock up an sqlalchemy table
    tbl = mock.MagicMock()
    tbl.name = "test_pandas"
    # mock up an sqlalchemy Connnection
    cxn = mock.MagicMock()
    df = pd.DataFrame(
        {"A": [4.5], "B'C": [math.nan], None: [math.inf], "D": [-math.inf], "E": [datetime(2022, 1, 1)], ":F": [1.0]}
    ).convert_dtypes()
    assert (df.dtypes == ['Float64', 'Int64', 'Float64', 'Float64', 'datetime64[ns]', 'Int64']).all()
    df.to_sql(
        tbl.name,
        con=cxn,
        schema="test",
        if_exists="append",
        index=False,
        method=TrinoBatchInsert(batch_size=5, verbose=True),
    )

The failure is that datetime64[ns] is translated to timestamp(3) by Pandas, but Iceberg wants timestamp(6). I have tried to figure out the correct dtype argument to pass to to_sql, but everything I've tried passing so far resolves to timestamp(3) inside of Pandas. I need to figure out how to pass it something that will resolve to timestamp(6).