duckdb/dbt-duckdb

Save snapshots to Azure Blob storage

namtab-ma-i opened this issue · 3 comments

I use in-memory DuckDB and my sources are all located in Azure Blob Storage. I set models to external materialize to the same storage and it works without issues. However, it seems that I can't use an external materialize option for snapshots.

Is there any way to do this? Or is there a way to set the .duckdb path to be in the blob storage?

External DuckDB code and error:

profiles.yml

  outputs:
    dev:
      type: duckdb
      attach:
        - path: abfs://<my_container>/dev.duckdb
          alias: snapshot_db
      extensions:
        - azure
        - parquet
      filesystems:
        - fs: abfs
          anon: false
          account_name: "{{ env_var('ADLS_STORAGE_ACCOUNT') }}"
          account_key: "{{ env_var('ADLS_STORAGE_ACCOUNT_KEY') }}"
        ```

This results in 

Encountered an error:
Runtime Error
NotImplementedError: File mode not supported

At:
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1957): _ _ init_ _
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/adlfs/spec.py(1833): _open
/IdeaProjects/ray-inference/.venv/lib/python3.9/site-packages/fsspec/spec.py(1298): open

        

Yeah snapshots require table mutations (i.e., UPDATE statements), which aren't supported for external materializations. Blob-store based .duckdb files are supported, but alas, they are read-only (the TL;DR here is that both of these features require the ability to do random writes in DuckDB, which is the only thing blob stores are not designed to do.)

If you can copy the .duckdb file to a local FS, you can do all of these things to it and then write the resulting .duckdb file back out to blob storage once you're done, which is the best option I have for you at the moment.