bug: JSON serialization of `date` types causes schema validation to fail in `target-postgres`
Opened this issue · 8 comments
Singer SDK Version
0.30.0
Meltano version 2.19.1
Is this a regression?
- Yes
Python Version
3.10
Bug scope
Targets (data type handling, batching, SQL object generation, etc.)
Operating System
Linux, Mac OS
Description
When syncing data between tap-snowflake
and target-postgres
this line causes JSON schema validation to fail because the inferred json schema from Snowflake is a date but this coerces the type into a datetime.
To summarize
- Column is of type
date
in Snowflake - Target column is of type
date
in Postgres - Because of the type conformation code schema validation fails in the target because it is expecting
format: string type: date
and instead receivesformat: string type: date-time
.
Why is that extra timestamp string being added on?
Additionally you can verify that this is a problem by syncing any snowflake date colum into target-jsonl and see that it is being serialized as a timestamp instead of a regular date even though
json.dumps(datetime.date(<year>,<month>,<day>).isoformat())
produces perfectly valid JSON. There's probably a reason that little string is added where it is but it does cause some problems for Meltano.
Code
2023-07-10T09:25:22.412171Z [info ] jsonschema.exceptions.ValidationError: '2019-08-31T00:00:00+00:00' is not a 'date' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412300Z [info ] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412421Z [info ] Failed validating 'format' in schema['properties']['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412552Z [info ] {'format': 'date', 'type': ['string', 'null']} cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412667Z [info ] cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412781Z [info ] On instance['expiration_date']: cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.412895Z [info ] '2019-08-31T00:00:00+00:00' cmd_type=elb consumer=True name=target-postgres producer=False stdio=stderr string_id=target-postgres
2023-07-10T09:25:22.476401Z [error ] Loader failed
2023-07-10T09:25:22.476859Z [error ] Block run completed. block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False
@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?
@edgarrmondragon is this related / similar to the Decimal bugs we were seeing?
@tayloramurphy nope, this is a decision made very early in the sdk history to add T00:00:00+00:00
to all date values:
sdk/singer_sdk/helpers/_typing.py
Line 478 in b02d698
I don't recall why that's the case, and I don't know if removing it would result in a breaking change for any taps/targets.
@edgarrmondragon this is a real Chesterton's fence situation... it doesn't really make sense to me why it would coerce a date to a datetime, particularly when those are distinct types. Thoughts on what a possible mitigation would be? Should this be configurable by the end user?
https://github.com/MeltanoLabs/tap-postgres/pull/172/files One possible implementation to fix it :D
@tayloramurphy @edgarrmondragon I also just ran into this when using tap-snowflake and created MeltanoLabs/tap-snowflake#26 before I knew this issue existed. It looks like this is the old MR that added it originally https://gitlab.com/meltano/sdk/-/merge_requests/42 but its a giant release MR so its hard to grok.
I think a reasonable solution inspired by Derek's implementation would be to encapsulate all the conforming methods into a dedicated class, add it to the base stream class, and allow tap developers to override the primitive-level conforming method.
I think I have a solution for this that
- Maintains backwards compatibility with whatever relied on this originally (Chesterton's fence)
- Makes type coercion easily extensible so extractors can manage their own
The implementation consists on refactoring _conform_primitive_property
using functools.singledispatch
.
I ran into this issue as we are trying to migrate our integrations at Akkio to Meltano. I am trying to migrate snowflake and got an error when uploading to target-s3 from tap-snowflake since date type is expected. I am considering a patch similar to https://github.com/MeltanoLabs/tap-postgres/pull/172/files where we override _conform_primitive_property for now.