meltano/sdk

bug: JSON serialization of `date` types causes schema validation to fail in `target-postgres`

Opened this issue · 8 comments

Singer SDK Version

0.30.0

Meltano version 2.19.1

Is this a regression?

  • Yes

Python Version

3.10

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

Linux, Mac OS

Description

When syncing data between tap-snowflake and target-postgres this line causes JSON schema validation to fail because the inferred json schema from Snowflake is a date but this coerces the type into a datetime.

To summarize

  • Column is of type date in Snowflake
  • Target column is of type date in Postgres
  • Because of the type conformation code schema validation fails in the target because it is expecting format: string type: date and instead receives format: string type: date-time.

Why is that extra timestamp string being added on?

Additionally you can verify that this is a problem by syncing any snowflake date colum into target-jsonl and see that it is being serialized as a timestamp instead of a regular date even though

json.dumps(datetime.date(<year>,<month>,<day>).isoformat())

produces perfectly valid JSON. There's probably a reason that little string is added where it is but it does cause some problems for Meltano.

Code

2023-07-10T09:25:22.412171Z [info     ] jsonschema.exceptions.ValidationError: '2019-08-31T00:00:00+00:00' is not a 'date' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412300Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412421Z [info     ] Failed validating 'format' in schema['properties']['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412552Z [info     ]     {'format': 'date', 'type': ['string', 'null']} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412667Z [info     ]                                cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412781Z [info     ] On instance['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412895Z [info     ]     '2019-08-31T00:00:00+00:00' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.476401Z [error    ] Loader failed
2023-07-10T09:25:22.476859Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False

@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?

@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?

@tayloramurphy nope, this is a decision made very early in the sdk history to add T00:00:00+00:00 to all date values:

return f"{elem.isoformat()}T00:00:00+00:00"

I don't recall why that's the case, and I don't know if removing it would result in a breaking change for any taps/targets.

@edgarrmondragon this is a real Chesterton's fence situation... it doesn't really make sense to me why it would coerce a date to a datetime, particularly when those are distinct types. Thoughts on what a possible mitigation would be? Should this be configurable by the end user?

visch commented

https://github.com/MeltanoLabs/tap-postgres/pull/172/files One possible implementation to fix it :D

@tayloramurphy @edgarrmondragon I also just ran into this when using tap-snowflake and created MeltanoLabs/tap-snowflake#26 before I knew this issue existed. It looks like this is the old MR that added it originally https://gitlab.com/meltano/sdk/-/merge_requests/42 but its a giant release MR so its hard to grok.

I think a reasonable solution inspired by Derek's implementation would be to encapsulate all the conforming methods into a dedicated class, add it to the base stream class, and allow tap developers to override the primitive-level conforming method.

I think I have a solution for this that

  1. Maintains backwards compatibility with whatever relied on this originally (Chesterton's fence)
  2. Makes type coercion easily extensible so extractors can manage their own

The implementation consists on refactoring _conform_primitive_property using functools.singledispatch.

I ran into this issue as we are trying to migrate our integrations at Akkio to Meltano. I am trying to migrate snowflake and got an error when uploading to target-s3 from tap-snowflake since date type is expected. I am considering a patch similar to https://github.com/MeltanoLabs/tap-postgres/pull/172/files where we override _conform_primitive_property for now.