snowflakedb/snowpark-python

SNOW-1511601: [local testing] Snowpark 1.18.0 broke inserts using current_timestamp() in timestamp columns

mraraujo opened this issue · 9 comments

Code such as the following works in 1.17.0 but throws an error in 1.18.0:

        merge_result = df.merge(
            source,
            cast(Column, (df['id'] == source['id'])),
            [when_not_matched().insert({
                'id': source['id'],
                'last_update': current_timestamp()
            })]
        )

The error is

    def write_table(
        self, name: Union[str, Iterable[str]], table: TableEmulator, mode: SaveMode
    ) -> Row:
        for column in table.columns:
            if not table[column].sf_type.nullable and table[column].isnull().any():
>               raise SnowparkLocalTestingException(
                    "NULL result in a non-nullable column"
                )
E               snowflake.snowpark.mock.exceptions.SnowparkLocalTestingException: NULL result in a non-nullable column

Note that replacing current_timestamp() with datetime.today() succeeds, but the behavior is not equivalent outside of local testing as current_timestamp() should run in the warehouse.

Hello @mraraujo ,

Thanks for raising the issue, we are looking into it, will update.

Regards,
Sujan

Hello @mraraujo ,

Could you share the code snippet so we can reproduce the issue.

Regards,
Sujan

A small reproducible example:

from snowflake.snowpark import Session, Column
from snowflake.snowpark.functions import current_timestamp, when_not_matched
from snowflake.snowpark.types import StructType, StructField, StringType, TimestampType, TimestampTimeZone
import datetime as dt
from typing import cast

session = Session.builder.config('local_testing', True).create()

target_data = [
    ("id1", dt.datetime(year=2024, month=3, day=1)),
]
target_schema = StructType([
    StructField("id", StringType()),
    StructField("last_update", TimestampType(TimestampTimeZone.NTZ)),
])

source_data = [
    ("id1"),
    ("id2"),
]

target = session.create_dataframe(target_data, schema=target_schema)
target.write.save_as_table('target', mode='overwrite')
target = session.table('target')

source = session.create_dataframe(source_data, schema=StructType([StructField("id", StringType())]))

merge_result = target.merge(
    source,
    cast(Column, (target['id'] == source['id'])),
    [when_not_matched().insert({
        'id': source['id'],
        'last_update': current_timestamp()
    })]
)

Note that the above snippet runs using snowpark 1.17.0 buts throws an exception if the version is >=1.18.0 .

Hello @mraraujo ,

Thank you for code snippet, we are able to reproduce the issue and confirms the issue is with local_testing, its working fine with standalone session.

The output as below with standalone session: Merge is happening properly
`------------------------------
|"ID" |"LAST_UPDATE" |

|id1 |2024-03-01 00:00:00 |


|"ID" |

|id1 |
|id2 |


|"ID" |"LAST_UPDATE" |

|id1 |2024-03-01 00:00:00 |
|id2 |2024-07-08 02:36:51.947000 |`

Where as with local_testing, the following error while merging 'SnowparkLocalTestingException: NULL result in a non-nullable column'

We will work on eliminating it.

Regards,
Sujan

thanks for the report. confirmed this is a bug in our insertion during the merge operation.
I have a PR out to fix the issue: #1949

I have merged the PR, it will be carried in our next release.
if you'd like to try it out now, you could install form the git main branch.

I'm closing the issue now, FYI we are about to do the release late this/early next week.