MeltanoLabs/tap-snowflake

Streamline schema name ignore for 'information_schema'

Opened this issue · 2 comments

Proposal would be to add something like excluded_schema_names to streamline the operation below, which excludes information_schema from catalog generation:

# overridden to filter out the information_schema from catalog discovery
def discover_catalog_entries(self) -> list[dict]:
"""Return a list of catalog entries from discovery.
Returns:
The discovered catalog entries as a list.
"""
result: list[dict] = []
engine = self.create_sqlalchemy_engine()
inspected = sqlalchemy.inspect(engine)
schema_names = [
schema_name
for schema_name in self.get_schema_names(engine, inspected)
if schema_name.lower() != "information_schema"
]
for schema_name in schema_names:
# Iterate through each table and view
for table_name, is_view in self.get_object_names(
engine, inspected, schema_name
):
catalog_entry = self.discover_catalog_entry(
engine, inspected, schema_name, table_name, is_view
)
result.append(catalog_entry.to_dict())
return result

Should probably be an SDK feature so that target developers can more easily implement.

@kgpayne - Can you log or link to an issue in the SDK? Thanks!

@kgpayne - Possible generic solve here:

In this approach, the setting would need to be ignore: ['information_schema-*']. We could make that the default setting, and if someone implemented a different rule, they'd provide the full ignore: ['information_schema-*', 'my-extra-ignores',...]. Or to get the information schema tables included in the sync, you could set the empty set: ignore: []