astronomer/airflow-provider-duckdb

Allow config to be passed as part of duckdb hook

Opened this issue · 0 comments

Currently, duckdb is only connected with the file path as per: https://github.com/astronomer/airflow-provider-duckdb/blob/main/duckdb_provider/hooks/duckdb_hook.py#L32 but it also allows to pass configuration as part of duckdb.connect(). The following are three parameters allowed:

def connect(database=None, read_only=False, config=None): # real signature unknown; restored from __doc__
    """
    connect(database: str = ':memory:', read_only: bool = False, config: object = None) -> duckdb.DuckDBPyConnection
    
    Create a DuckDB database instance. Can take a database file name to read/write persistent data and a read_only flag if no changes are desired
    """

If you want to create a second connection to an existing database, you can use the cursor() method. This might be useful for example to allow parallel threads to run queries independently. A single connection is thread-safe but is locked for the duration of the queries, effectively serializing database access in this case. This can also be solved using duckdb configuration parameters like

duckdb.connect("/tmp/test.db",config={"external_threads":2, "threads":3, "worker_threads":4})

Connections are closed implicitly when they go out of scope or if they are explicitly closed using close(). Once the last connection to a database instance is closed, the database instance is closed as well.

Read more at: https://duckdb.org/docs/api/python/overview.html#startup--shutdown

All allowed configurations are listed at: https://duckdb.org/docs/sql/configuration.html