duckdb/duckdb-wasm

HTTPFS + S3 doesn't honour `s3_url_style`

Opened this issue · 1 comments

rmoff commented

Using https://shellwip.duckdb.org/

DuckDB Web Shell
Database: v0.7.2-dev911
Package:  @duckdb/duckdb-wasm@1.11.0

Load the HTTPFS and Parquet extensions

duckdb> load httpfs;
┌┐
└┘
Elapsed: 194 ms

duckdb> load parquet;
┌┐
└┘
Elapsed: 245 ms

Set S3 parameters

duckdb> SET s3_endpoint='rmoff-test.us-east-2.lakefscloud.io';
   ...> SET s3_access_key_id='xxxxxx';
   ...> SET s3_secret_access_key='xxxxx';
   ...> SET s3_url_style='path';
   ...> SET s3_region='us-east-1';
   ...> 
┌┐

Note that s3_url_style has been set to path, confirmed:

duckdb> SELECT * FROM duckdb_settings() where name ilike 's3_url_s%';
┌──────────────┬───────┬────────────────────────────────────────────┬────────────┐
│ name         ┆ value ┆ description                                ┆ input_type │
╞══════════════╪═══════╪════════════════════════════════════════════╪════════════╡
│ s3_url_style ┆ path  ┆ S3 url style ('vhost' (default) or 'path') ┆ VARCHAR    │
└──────────────┴───────┴────────────────────────────────────────────┴────────────┘
Elapsed: 1 ms

but when running a query, it uses vhost style and thus fails:

duckdb> SELECT * FROM read_parquet('s3://drones03/main/drone-registrations/Registations-RecFlyer-Active-*.parquet') limit 5;
Invalid Error: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'https://drones03.rmoff-test.us-east-2.lakefscloud.io/main/drone-registrations/Registations-RecFlyer-Active-*.parquet'.

Thank you for testing this out @rmoff.

I had a proper look, it will require some rework specific to using browser enabled requests instead of the ones provided by httplib, but a blocker for this is extension loading support (#1202).

I will come back when there are news on this.