CloudCannon/pagefind

feature request: Python wrapper package, Python API

Opened this issue ยท 14 comments

I'm building a documentation site using mkdocs, a python-based static site generator. Adding a python wrapper package would let me install the pagefind CLI without a node.js toolchain or a script to download and validate the pre-compiled binary. A Python API would also be useful for writing plugins for mkdocs and other existing python static site generators.

I'd be happy to contribute the python wrapper package and API bindings to this repo. Alternately, I'd also be happy to create a separate repo for managing the python wrapper package if you don't want to increase the maintenance burden here.

bglw commented

๐Ÿ‘‹

There's precedence for both approaches. The Node wrapper is maintained in this repo, while there exists Nix and Juliet wrappers maintained externally.

I'm not a heavy Python user myself, so I'm slightly hesitant to take on maintenance โ€” however I think maintaining it in this repo would be the best result for users, so we should go with that. I've been wanting to offer a distribution channel for users without Node, and Python seems like a good avenue (plus I'd love to have it available in mkdocs, which I'm a big fan of).

One thing I would like to avoid is having the wrapper package download binaries from GitHub actions, since that has caused issues/slowdowns in the past. Is it possible to bundle a platform-specific binary into a python wrapper in a similar manner to the current npm optionalDependencies?

If you're happy contributing the code, and helping me get automated publishing set up, then it would be much appreciated! ๐Ÿ™‚ Let me know what I can do to help you.

Looks like I should be able to crib off of https://github.com/astral-sh/ruff/blob/main/.github/workflows/release.yaml, which uses maturin (part of the Py03 project) to shove all the relevant binaries and files into a single python package.

This is a long shot, but would you be open to using goreleaser for non-node packaging? On one hand, with a few lines of hacks GoReleaser can invoke cargo to build the binary and then repackage the binary for many non-node distribution formats including deb, rpm, HomeBrew, Scoop, etc. On the other hand, GoReleaser probably won't have first-class support for Rust for the foreseeable future. There's also a rust-releaser project that seems interesting, but it's not as mature as GoReleaser.

Update: my OSS time at work has dried up for the moment; I'm not sure when I'll get back to this.

bglw commented

No worries! Some notes for when you do, or for anyone else who comes across this issue and wants to take it on:

I don't mind what tool is used to actually release the Python package, but it doesn't need to worry about actually building Pagefind, so it doesn't need any level or support for (or knowledge of) Rust.

The publishing will slot into the existing action alongside the publish-binary-npm-packages step, which get given all built assets to download:

- uses: actions/download-artifact@v3
with:
name: release
path: build-artifacts

So the starting point will have the built executables for every platform, and the checksums for each executable :) All it needs to do it package the files up and publish it.

Hey, I had some time and took a look at the node.js wrapper. Would you mind passing pages/other content by-referneve over FFI rather than the service rpc to avoid serialization/deserialization overhead?

bglw commented

I'd hesitate to move things to FFI; I don't want it to be a requirement for creating a wrapper package, as the current "external" communication means you can wrap it with anything that can read/write to stdio, or indeed it could talk over a network.

If the overhead does become a performance bottleneck though, that should be solved.

I based a lot of the concept on esbuild's Go service API, which avoids FFI. Notably, over a certain message size they pass content via temporary files which is quicker and has a lower de-serialization overhead compared to the stdio messaging. Pagefind doesn't do this yet, so trying out a similar idea would be the first port of call if we need to improve things. I haven't seen any discussion around their wrapper performance being a problem so I have some confidence this will be good enough for us too.

But, if there was still too much overhead with that approach and it was impacting an actual use-case, I'd be open to looking into FFI ๐Ÿ™‚ (I would want to keep the current workflow though, so it would mean maintaining two service APIs)

That makes sense, and avoiding FFI will certainly speed up implementation.

On a different topic, have you considered putting together an ARCHITECTURE.md for this repo? I'd like to hear more about the trade-offs/decisions that got pagefind to its current state.

bglw commented

It's a good idea! (and I do swoon over matklads posts ๐Ÿ˜„)

If I can find the perfect task to procrastinate from by writing one, it just might happen ๐Ÿ˜‰

Hi @bglw, I've got a working draft of the Python API over in my fork of this repo (not yet PR'd here since I didn't see an drop-down with the option to create a draft PR). Could you take a look at the integration test and tell me what you think?

I'm planning on following the packaging pattern laid out in https://simonwillison.net/2022/May/23/bundling-binary-tools-in-python-wheels/ to include the built pagefind binary at the root of the python package.

Hey, since I haven't got any feedback on my proposed wrapper API, I'm going to move the python-related code to a separate repo in order to deploy independent of this repo's release cycle. I'm thinking of deploying the experimental python API in a package named pagefind-python, and the binary as pagefind-python-bin in order to reserve the pagefind package name for official use.

bglw commented

Hey @SKalt โ€” apologies, I missed that message ๐Ÿ™ˆ

I won't have time this week, but I can give it a proper review next week. Happy for you to open a standard PR and mention that it's a draft, equally happy for you to continue in a separate repo and publish yourself :)

If you want to publish separately for now and later merge into an official package release once it's less experimental, we can do that too

Hi @bglw, WIP PR sent! Let me know what you think of the API.

SKalt commented

https://github.com/sKalt/pagefind_python <- you can find the python API here, now with CI and a rough--but-passable build system.

bglw commented

CI scripts looking good ๐Ÿ‘