meltano/sdk

bug: target existing sink check returns incorrect boolean when `add_metadata_columns=True`

Closed this issue · 1 comments

The way that the get_sink method does a comparison between the existing sink schema and the incoming singer message schema returns an incorrect boolean when add_metadata_columns config is true. The base sink automatically adds the metadata columns when the sink is initialized so the sink's schema is no longer matching the original singer message schema that was provided. If the target receives another exact same schema message in the same sync it will do a comparison between the singer schema and the sink schema that now has metadata columns, it will always decide to drain and recreate the sink because it thinks the schema has changed.

For some taps that emit lots of schema messages, I think when child streams are implemented, i.e. tap-github the target becomes super inefficient because its constantly draining and recreating sinks even if they only have a few records and the schema hasnt changed.

Originally discovered during target-snowflake development and was fixed https://github.com/MeltanoLabs/target-snowflake/blob/86ca8067636d0c82350f7868785fcc998e113dcb/target_snowflake/target.py#L89 by removing metadata columns prior to doing the comparison.

I have a PR on the way that should fix this.