arrow extensions
rickiesmooth opened this issue · 29 comments
I'm trying to load the arrow extension on an ARM lambda, by executing the following statement:
INSTALL arrow; LOAD arrow;
and I'm met with the following error:
INFO [Error: IO Error: Extension "/tmp/.duckdb/extensions/v0.8.0/linux_arm64/arrow.duckdb_extension"
could not be loaded: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /tmp/.duckdb/extensions/v0.8.0/linux_arm64/arrow.duckdb_extension)]
{
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'IO'
}
Do you maybe have some pointers on how to fix this?
You can't install default extensions which weren't compiled on Amazon Linux 2... This leads to the exact error you're seeing.
gotcha so if i switch to x86 it should just work?
Is it cumbersome to compile the default extension for linux 2?
No, this will be the same problem with x86 as well. This is due to GLIBC incompatibilities. I have a build pipeline for extensions here: https://github.com/tobilg/duckdb-nodejs-layer/blob/main/Dockerfile.spatial.x86_64
Would you mind giving some insights about your use case @rickiesmooth?
I'm looking at .github/workflows/main.yml
, and I have an idea on how to add some workflow steps for building something like Dockerfile.arrow.arm64
using spatial as an example.
This would work great if I just need the arrow extension, but how would that work if I need both the spatial extension and the arrow extension? Have you considered having one "fat" image that holds all the official extensions which you can load, or would the layer size become too big? Alternatively, would it maybe be possible to build a layer with the correct GLIBC versions and load the extensions directly from S3 ala http://extensions.duckdb.org/v{release_version_number}/{platform_name}/{extension_name}.duckdb_extension.gz
? I'm pretty clueless on this subject, so sorry sounding uninformed. 😅
My specific use case is using arrowIPCStream in combination with AWS Lambda response streaming...so I only just need to install the arrow extension for that..
You‘d have to build „all“ relevant extensions for Amazon Linux 2 and store them somewhere accessible via HTTPS. Which is a lot of effort honestly. It cost me about a dozen hours to get the spatial extension to work. Then, there’s the issue that some extensions need specific DuckDB versions, and are not compatible with others… It’s not a simple thing to accomplish…
hmm too bad, since the docs explicitly mention that downloading an extension from S3 could be helpful when building a lambda, I was hoping that that would make it a bit easier.
What would you suggest? Make a similar layer as the spatial layer?
Can you link the docs you mention? Well actually downloading from S3 is the easy part… Building and storing them is much more effort unfortunately.
I see, too bad! This was the doc I was reading: https://duckdb.org/docs/extensions/working_with_extensions
Downloading an extension directly could be helpful when building a lambda or container that uses DuckDB. DuckDB extensions are stored in public S3 buckets, but the directory structure of those buckets is not searchable. As a result, a direct URL to the file must be used. To directly download an extension file, use the following format:
For example:
https://extensions.duckdb.org/v0.8.1/windows_amd64/json.duckdb_extension.gz
The list of supported platforms may increase over time, but the current list of platforms includes:
linux_amd64_gcc4
linux_amd64
linux_arm64
osx_amd64
osx_arm64
wasm_eh DuckDB-Wasm’s extensions
wasm_mvp DuckDB-Wasm’s extensions
windows_amd64
windows_amd64_rtools
See above for a list of extension names and how to pull the latest list of extensions.
e.g. https://extensions.duckdb.org/v0.8.1/linux_arm64/arrow.duckdb_extension.gz
You can now try loading it from https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension Please give me some feedback whether this works or not. Thanks
Sorry but I'm unsure how to load it.
I naively tried:
INSTALL https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension; LOAD arrow;
And that doesn't work? What's the error message? How did you test it? In Lambda directly? Did your allow unsigned extensions? https://duckdb.org/docs/archive/0.8.1/extensions/overview.html#unsigned-extensions
BTW, the compiled extension is x86.
I get the following error when I test it in the Lambda directly:
Error: IO Error: Failed to read extension from "https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension": no such file] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'IO'
}
my lambda:
// Instantiate DuckDB
const duckDB = new DuckDB.Database(":memory:", {
allow_unsigned_extensions: "true",
});
// Create connection
const connection = duckDB.connect();
// Store initialization
let isInitialized = false;
export const handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
console.log(event, context);
try {
// Check if DuckDB has been initalized
if (!isInitialized) {
await query(`
SET home_directory='/tmp';
SET enable_http_metadata_cache=true;
SET enable_object_cache=true;
INSTALL 'https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension';
`);
isInitialized = true;
}
While when I visit the link in my browser it downloads the extension.
ah I've tested it on arm, I can try with x86 later today!
Haven't tried it, but I was pointed to duckdb/duckdb#6049 (comment) in the DuckDB Discord.
I think I was able to provide a solution for x86 based DuckDB Lambda functions:
Try to do the following:
SET custom_extension_repository = 'http://extensions.quacking.cloud';
INSTALL arrow;
Works with my DuckDB Lambda deployment. I haven't checked the functionality of the Arrow extension though.
wow that's slick! I've tried it out, but for some reason it tries to load an AMD extension?
Error: HTTP Error: Failed to download extension "arrow" at URL "http://extensions.quacking.cloud/v0.8.0/linux_amd64/arrow.duckdb_extension.gz
await query(`
SET home_directory='/tmp';
SET enable_http_metadata_cache=true;
SET enable_object_cache=true;
SET custom_extension_repository = 'http://extensions.quacking.cloud';
INSTALL arrow;
`);
I've updated my lambda to use the x86 architecture so I'm unsure why it tries to download the linux_amd64
architecture.
Full error:
2023-09-19T17:27:43.698Z bbc0b6d7-ef1c-41fe-913d-738a332c6286 INFO [Error: HTTP Error: Failed to download extension "arrow" at URL "http://extensions.quacking.cloud/v0.8.0/linux_amd64/arrow.duckdb_extension.gz"
Candidate extensions: "parquet"] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'HTTP',
statusCode: 403,
response: '<?xml version="1.0" encoding="UTF-8"?>\n' +
'<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>45ZF0EY0Q54J8Z6A</RequestId><HostId>shYHxazrKW3GTSrD4UMUkXyV/tY8CoaFp/GnJgUHHgeI549VQHYT+yHHm5469Kj/RcL4jcZQK30=</HostId></Error>',
reason: 'Forbidden',
headers: {
'Alt-Svc': 'h3=":443"; ma=86400',
Connection: 'close',
'Content-Type': 'application/xml',
Date: 'Tue, 19 Sep 2023 17:27:43 GMT',
Server: 'AmazonS3',
'Transfer-Encoding': 'chunked',
Via: '1.1 936f33bed45438343f0ef2adff442814.cloudfront.net (CloudFront)',
'X-Amz-Cf-Id': 'F-TjTP5qVLObpyePoy5-D5kYDyZ6RymzXyhzxYfc3ULjsR0SQaPcOA==',
'X-Amz-Cf-Pop': 'IAD89-C1',
'X-Cache': 'Error from cloudfront'
}
}
layer:
1 | duckdb-nodejs-x86 | 3 | nodejs14.x, nodejs16.x, nodejs18.x | - | arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-x86:3 |
---|
Yeah, that layer version is still on v0.8.0. I‘ll have to update it to 0.8.1 I guess. The error comes from the plug-in not being published for this version to S3.
Can you try the spatial layer from https://github.com/tobilg/duckdb-nodejs-layer#arns? That should work…
soo close:
2023-09-19T19:33:03.199Z 5cdffb53-23b8-4b67-9cfd-b69f997aaee8 INFO [
Error: Invalid Input Error: Extension "/tmp/.duckdb/extensions/a532702b9a/linux_amd64/arrow.duckdb_extension" version (v0.8.2-dev3190) does not match DuckDB version (0.8.2-dev2083)
] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'Invalid Input'
}
I saw some comments in discord about dev versions, so if I understand correctly if we'll be able to use a stable version here everything (spatial & arrow) will just work?
Hm, that’s somehow strange because I use the same layer version, and it worked in my case.
But unfortunately that’s what I meant with the version conflicts in DuckDB’s Discord. It’s not easy to downgrade the versions because the extensions reference other commits in the submodules. Will have to think about how this can be solved.
Can you do a select version();
? It should show 0.8.2-dev2083
if you're using duckdb-nodejs-spatial-x86:1
... What puzzles me is that I use the above layer version, and it works flawlessly.
it does show 0.8.2-dev2083
when I do select version();
Forgot to mention that I needed to install and load it like so:
INSTALL arrow;
LOAD arrow;
Otherwise I'd get the following error:
Error: Catalog Error: Function with name "to_arrow_ipc" is not in the catalog, but it exists in the arrow extension.
To install and load the extension, run:
INSTALL arrow;
LOAD arrow;] {
errno: -1,
code: 'DUCKDB_NODEJS_ERROR',
errorType: 'Catalog'
}
So, I think I got it working finally... I published a new test layer which is built upon the latest supported Spatial extension version of DuckDB. Then, I forked the Arrow extension and manually updated to the same DuckDB commit.
I published this as a test Lambda layer: arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-spatial-test-x86:1
, and the GitHub Action Workflow to publish the regular layer is currently running. This will then be usable via arn:aws:lambda:$REGION:041475135427:layer:duckdb-nodejs-spatial-x86:2
.
You should then be able to do:
LOAD '/opt/nodejs/node_modules/duckdb/extensions/spatial.duckdb_extension';
(this loads the spatial extension which is included in the layer)SET custom_extension_repository = 'http://extensions.quacking.cloud';
(this sets the custom extension repo)INSTALL arrow;
(this loads the arrow extension from http://extensions.quacking.cloud/9db510bd11/linux_amd64/arrow.duckdb_extension.gz)LOAD arrow;
(this loads the arrow extension)
Awesome! I'll test it out tonight 🙌
this worked as a charm, thank you so much!
Great! Curious about what you‘ll build though 😬
ping me on discord if you want a sneak peek 😉
@rickiesmooth I just published the spatial
and arrow
extensions for DuckDB v0.9.0, which is available with arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-x86:5
:
SET custom_extension_repository = 'http://extensions.quacking.cloud';
INSTALL arrow;
LOAD arow;
Works for me at least :-)
great! I'll give it a try and report back 🫡