tobilg/duckdb-nodejs-layer

arrow extensions

rickiesmooth opened this issue · 29 comments

I'm trying to load the arrow extension on an ARM lambda, by executing the following statement:

INSTALL arrow; LOAD arrow;

and I'm met with the following error:

INFO	[Error: IO Error: Extension "/tmp/.duckdb/extensions/v0.8.0/linux_arm64/arrow.duckdb_extension"
could not be loaded: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /tmp/.duckdb/extensions/v0.8.0/linux_arm64/arrow.duckdb_extension)] 
{
  errno: -1, 
  code: 'DUCKDB_NODEJS_ERROR', 
  errorType: 'IO'
}

Do you maybe have some pointers on how to fix this?

tobilg commented

You can't install default extensions which weren't compiled on Amazon Linux 2... This leads to the exact error you're seeing.

gotcha so if i switch to x86 it should just work?

Is it cumbersome to compile the default extension for linux 2?

tobilg commented

No, this will be the same problem with x86 as well. This is due to GLIBC incompatibilities. I have a build pipeline for extensions here: https://github.com/tobilg/duckdb-nodejs-layer/blob/main/Dockerfile.spatial.x86_64

tobilg commented

Would you mind giving some insights about your use case @rickiesmooth?

I'm looking at .github/workflows/main.yml, and I have an idea on how to add some workflow steps for building something like Dockerfile.arrow.arm64 using spatial as an example.

This would work great if I just need the arrow extension, but how would that work if I need both the spatial extension and the arrow extension? Have you considered having one "fat" image that holds all the official extensions which you can load, or would the layer size become too big? Alternatively, would it maybe be possible to build a layer with the correct GLIBC versions and load the extensions directly from S3 ala http://extensions.duckdb.org/v{release_version_number}/{platform_name}/{extension_name}.duckdb_extension.gz? I'm pretty clueless on this subject, so sorry sounding uninformed. 😅

My specific use case is using arrowIPCStream in combination with AWS Lambda response streaming...so I only just need to install the arrow extension for that..

tobilg commented

You‘d have to build „all“ relevant extensions for Amazon Linux 2 and store them somewhere accessible via HTTPS. Which is a lot of effort honestly. It cost me about a dozen hours to get the spatial extension to work. Then, there’s the issue that some extensions need specific DuckDB versions, and are not compatible with others… It’s not a simple thing to accomplish…

hmm too bad, since the docs explicitly mention that downloading an extension from S3 could be helpful when building a lambda, I was hoping that that would make it a bit easier.

What would you suggest? Make a similar layer as the spatial layer?

tobilg commented

Can you link the docs you mention? Well actually downloading from S3 is the easy part… Building and storing them is much more effort unfortunately.

I see, too bad! This was the doc I was reading: https://duckdb.org/docs/extensions/working_with_extensions

Downloading an extension directly could be helpful when building a lambda or container that uses DuckDB. DuckDB extensions are stored in public S3 buckets, but the directory structure of those buckets is not searchable. As a result, a direct URL to the file must be used. To directly download an extension file, use the following format:

https://extensions.duckdb.org/v{release_version_number}/{platform_name}/{extension_name}.duckdb_extension.gz

For example:

https://extensions.duckdb.org/v0.8.1/windows_amd64/json.duckdb_extension.gz

The list of supported platforms may increase over time, but the current list of platforms includes:

linux_amd64_gcc4
linux_amd64
linux_arm64
osx_amd64
osx_arm64
wasm_eh DuckDB-Wasm’s extensions
wasm_mvp DuckDB-Wasm’s extensions
windows_amd64
windows_amd64_rtools
See above for a list of extension names and how to pull the latest list of extensions.

e.g. https://extensions.duckdb.org/v0.8.1/linux_arm64/arrow.duckdb_extension.gz

tobilg commented

You can now try loading it from https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension Please give me some feedback whether this works or not. Thanks

Sorry but I'm unsure how to load it.

I naively tried:

INSTALL https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension; LOAD arrow;
tobilg commented

And that doesn't work? What's the error message? How did you test it? In Lambda directly? Did your allow unsigned extensions? https://duckdb.org/docs/archive/0.8.1/extensions/overview.html#unsigned-extensions

BTW, the compiled extension is x86.

I get the following error when I test it in the Lambda directly:

Error: IO Error: Failed to read extension from "https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension": no such file] {
  errno: -1,
  code: 'DUCKDB_NODEJS_ERROR',
  errorType: 'IO'
}

my lambda:

// Instantiate DuckDB
const duckDB = new DuckDB.Database(":memory:", {
  allow_unsigned_extensions: "true",
});

// Create connection
const connection = duckDB.connect();

// Store initialization
let isInitialized = false;

export const handler = awslambda.streamifyResponse(async (event, responseStream, context) => {
  console.log(event, context);
  try {
    // Check if DuckDB has been initalized
    if (!isInitialized) {
      await query(`
        SET home_directory='/tmp';
        SET enable_http_metadata_cache=true;
        SET enable_object_cache=true;
        INSTALL 'https://github.com/tobilg/duckdb-nodejs-layer/raw/main/release/extensions/v0.8.1/arrow.duckdb_extension';
      `);
      isInitialized = true;
    }

While when I visit the link in my browser it downloads the extension.

ah I've tested it on arm, I can try with x86 later today!

tobilg commented

Haven't tried it, but I was pointed to duckdb/duckdb#6049 (comment) in the DuckDB Discord.

tobilg commented

I think I was able to provide a solution for x86 based DuckDB Lambda functions:

Try to do the following:

  • SET custom_extension_repository = 'http://extensions.quacking.cloud';
  • INSTALL arrow;

Works with my DuckDB Lambda deployment. I haven't checked the functionality of the Arrow extension though.

wow that's slick! I've tried it out, but for some reason it tries to load an AMD extension?

Error: HTTP Error: Failed to download extension "arrow" at URL "http://extensions.quacking.cloud/v0.8.0/linux_amd64/arrow.duckdb_extension.gz
await query(`
  SET home_directory='/tmp';
  SET enable_http_metadata_cache=true;
  SET enable_object_cache=true;
  SET custom_extension_repository = 'http://extensions.quacking.cloud';
  INSTALL arrow;
`);

I've updated my lambda to use the x86 architecture so I'm unsure why it tries to download the linux_amd64 architecture.

Full error:

2023-09-19T17:27:43.698Z	bbc0b6d7-ef1c-41fe-913d-738a332c6286	INFO	[Error: HTTP Error: Failed to download extension "arrow" at URL "http://extensions.quacking.cloud/v0.8.0/linux_amd64/arrow.duckdb_extension.gz"

Candidate extensions: "parquet"] {
  errno: -1,
  code: 'DUCKDB_NODEJS_ERROR',
  errorType: 'HTTP',
  statusCode: 403,
  response: '<?xml version="1.0" encoding="UTF-8"?>\n' +
    '<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>45ZF0EY0Q54J8Z6A</RequestId><HostId>shYHxazrKW3GTSrD4UMUkXyV/tY8CoaFp/GnJgUHHgeI549VQHYT+yHHm5469Kj/RcL4jcZQK30=</HostId></Error>',
  reason: 'Forbidden',
  headers: {
    'Alt-Svc': 'h3=":443"; ma=86400',
    Connection: 'close',
    'Content-Type': 'application/xml',
    Date: 'Tue, 19 Sep 2023 17:27:43 GMT',
    Server: 'AmazonS3',
    'Transfer-Encoding': 'chunked',
    Via: '1.1 936f33bed45438343f0ef2adff442814.cloudfront.net (CloudFront)',
    'X-Amz-Cf-Id': 'F-TjTP5qVLObpyePoy5-D5kYDyZ6RymzXyhzxYfc3ULjsR0SQaPcOA==',
    'X-Amz-Cf-Pop': 'IAD89-C1',
    'X-Cache': 'Error from cloudfront'
  }
}

layer:

1 duckdb-nodejs-x86 3 nodejs14.x, nodejs16.x, nodejs18.x - arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-x86:3
tobilg commented

Yeah, that layer version is still on v0.8.0. I‘ll have to update it to 0.8.1 I guess. The error comes from the plug-in not being published for this version to S3.

Can you try the spatial layer from https://github.com/tobilg/duckdb-nodejs-layer#arns? That should work…

soo close:

2023-09-19T19:33:03.199Z	5cdffb53-23b8-4b67-9cfd-b69f997aaee8	INFO	[
Error: Invalid Input Error: Extension "/tmp/.duckdb/extensions/a532702b9a/linux_amd64/arrow.duckdb_extension" version (v0.8.2-dev3190) does not match DuckDB version (0.8.2-dev2083)
] {
  errno: -1,
  code: 'DUCKDB_NODEJS_ERROR',
  errorType: 'Invalid Input'
}

I saw some comments in discord about dev versions, so if I understand correctly if we'll be able to use a stable version here everything (spatial & arrow) will just work?

tobilg commented

Hm, that’s somehow strange because I use the same layer version, and it worked in my case.

But unfortunately that’s what I meant with the version conflicts in DuckDB’s Discord. It’s not easy to downgrade the versions because the extensions reference other commits in the submodules. Will have to think about how this can be solved.

tobilg commented

Can you do a select version();? It should show 0.8.2-dev2083 if you're using duckdb-nodejs-spatial-x86:1... What puzzles me is that I use the above layer version, and it works flawlessly.

it does show 0.8.2-dev2083 when I do select version();

Forgot to mention that I needed to install and load it like so:

INSTALL arrow;
LOAD arrow;

Otherwise I'd get the following error:

Error: Catalog Error: Function with name "to_arrow_ipc" is not in the catalog, but it exists in the arrow extension.

To install and load the extension, run:
INSTALL arrow;
LOAD arrow;] {
  errno: -1,
  code: 'DUCKDB_NODEJS_ERROR',
  errorType: 'Catalog'
}
tobilg commented

So, I think I got it working finally... I published a new test layer which is built upon the latest supported Spatial extension version of DuckDB. Then, I forked the Arrow extension and manually updated to the same DuckDB commit.

I published this as a test Lambda layer: arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-spatial-test-x86:1, and the GitHub Action Workflow to publish the regular layer is currently running. This will then be usable via arn:aws:lambda:$REGION:041475135427:layer:duckdb-nodejs-spatial-x86:2.

You should then be able to do:

  • LOAD '/opt/nodejs/node_modules/duckdb/extensions/spatial.duckdb_extension'; (this loads the spatial extension which is included in the layer)
  • SET custom_extension_repository = 'http://extensions.quacking.cloud'; (this sets the custom extension repo)
  • INSTALL arrow; (this loads the arrow extension from http://extensions.quacking.cloud/9db510bd11/linux_amd64/arrow.duckdb_extension.gz)
  • LOAD arrow; (this loads the arrow extension)

Awesome! I'll test it out tonight 🙌

this worked as a charm, thank you so much!

tobilg commented

Great! Curious about what you‘ll build though 😬

ping me on discord if you want a sneak peek 😉

tobilg commented

@rickiesmooth I just published the spatial and arrow extensions for DuckDB v0.9.0, which is available with arn:aws:lambda:us-east-1:041475135427:layer:duckdb-nodejs-x86:5:

  • SET custom_extension_repository = 'http://extensions.quacking.cloud';
  • INSTALL arrow;
  • LOAD arow;

Works for me at least :-)

great! I'll give it a try and report back 🫡