/duckdb_azure

Azure extension for DuckDB

Primary LanguageC++MIT LicenseMIT

DuckDB Azure Extension

This extension adds a filesystem abstraction for Azure blob storage to DuckDB. To use it, install latest DuckDB. The extension currently supports only reads and globs.

Basics

Setup authentication (leverages either Azure CLI or Managed Identity):

CREATE SECRET secret1 (
    TYPE AZURE,
    PROVIDER CREDENTIAL_CHAIN,
    ACCOUNT_NAME '⟨storage account name⟩'
);

Then to query a file on azure:

SELECT count(*) FROM 'az://<my_container>/<my_file>.<parquet_or_csv>';

Globbing is also supported:

SELECT count(*) FROM 'az://dummy_container/*.csv';

Other authentication methods

Other authentication options available:

Connection string

CREATE SECRET secret2 (
    TYPE AZURE,
    CONNECTION_STRING '<value>'
);

Service Principal

(replace CLIENT_SECRET with CLIENT_CERTIFICATE_PATH to use a client certificate)

CREATE SECRET azure3 (
    TYPE AZURE,
    PROVIDER SERVICE_PRINCIPAL,
    TENANT_ID '⟨tenant id⟩',
    CLIENT_ID '⟨client id⟩',
    CLIENT_SECRET '⟨client secret⟩',
    ACCOUNT_NAME '⟨storage account name⟩'
);

Access token

(its audience needs to be https://storage.azure.com)

CREATE SECRET secret4 (
    TYPE AZURE,
    PROVIDER ACCESS_TOKEN,
    ACCESS_TOKEN '<value>'
    ACCOUNT_NAME '⟨storage account name⟩'
);

Anonymous

CREATE SECRET secret5 (
    TYPE AZURE,
    PROVIDER CONFIG,
    ACCOUNT_NAME '⟨storage account name⟩'
);

Supported architectures

The extension is tested & distributed for Linux (x64, arm64), MacOS (x64, arm64) and Windows (x64)

Documentation

See the Azure page in the DuckDB documentation.

Check out the tests in test/sql for more examples.

Building

For development, this extension requires CMake, Python3, a C++11 compliant compiler, and the Azure C++ SDK. Run make in the root directory to compile the sources. Run make debug to build a non-optimized debug version. Run make test to verify that your version works properly after making changes. Install the Azure C++ SDK using vcpkg and set the VCPKG_TOOLCHAIN_PATH environment variable when building.

sudo apt-get update && sudo apt-get install -y git g++ cmake ninja-build libssl-dev
git clone --recursive https://github.com/duckdb/duckdb_azure
git clone https://github.com/microsoft/vcpkg
./vcpkg/bootstrap-vcpkg.sh
cd duckdb_azure
GEN=ninja VCPKG_TOOLCHAIN_PATH=$PWD/../vcpkg/scripts/buildsystems/vcpkg.cmake make

Please also refer to our Build Guide and Contribution Guide.