astronomer/astro-sdk

Allow users to disable schema check & creation on `load_file`

tatiana opened this issue · 0 comments

Please describe the feature you'd like to see
As of Astro SDK 1.6.0, the load_file operation checks if the schema exists and if it doesn't, it attempts to create it.

Recently a user reported that the cost of checking if the schema exists is very high:
"I have a task that took 1:36 minutes to run, and it was 1:30 running the information schema query"

This was reported for Snowflake, but the same issue can apply to most of the supported Databases.

Describe the solution you'd like
Users should be able to run load_file with a boolean argument schema_exists. For backwards compatibility, the default value should be False. If this argument is False, the Python SDK does not check if the schema exists and does not attempt to create it.

Are there any alternatives to this feature?

  1. Find a more efficient way to check if the schema exists:
"SELECT SCHEMA_NAME from information_schema.schemata WHERE LOWER(SCHEMA_NAME) = %(schema_name)s;",
  1. Have a more generic way of allowing users to disable "optional" queries run by the Astro SDK.

Additional context
Follow up with customer on Slack: https://astronomer.slack.com/archives/C04L0HNK9ME/p1683231202383579?thread_ts=1682346906.404539&cid=C04L0HNK9ME

Acceptance Criteria

  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, once available)
  • Integration tests (if the feature relates to a new database or external service)
  • Example DAG
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Improve the documentation (README, Sphinx, and any other relevant)