Build self-contained native binaries
amotl opened this issue · 5 comments
About
We might think about using traditional PyInstaller to build self-contained native binaries? Alternatively, let's try Briefcase, or PyApp?
ctk.exe
, anyone?
References
Introduction
We discussed matters of how to derive subsets of functionality in CrateDB Toolkit into dedicated release artefacts, which not necessarily need to follow the development iterations and release cadence of Toolkit.
Most prominently, this use case appeared with GH-88 and GH-153. We are sharing our thoughts here, about our first approach to that topic on behalf of monorepo paradigms.
Proposal
This is our first proposal, based on preliminary discussions around needs, requirements, and their possible benefits or pitfalls.
Driver
- There will be a GHA workflow, which slices the package on behalf of labels corresponding to setuptools
extra
labels 1:1. - Signals to the driver will be submitted on behalf of plain Git tags, being pushed to remote, like LangChain or others are doing it.
Publisher
In order to detach from the regular cadence and relevant build processes, we proposed to:
-
Use a date based versioning scheme 12, in order to signal low churn, and give users the idea of "software age" right away. For example, a possible first artefact for CrateDB CFR could be
cratedb-cfr-2024.6.1.exe
, when applicable. 3 -
Upload artefacts to ghcr.io instead of using GitHub release assets, like apparently Homebrew is doing it, when possible.
Footnotes
-
https://packaging.python.org/en/latest/specifications/version-specifiers/#version-epochs ↩
-
https://packaging.python.org/en/latest/specifications/version-specifiers/#final-releases ↩
-
cratedb-cfr-{year}.{month}.{iteration-skipping-zero}.exe
↩
Hi there,
I appreciate the idea of using setuptools extra
labels for slicing the distribution on behalf of defining a subset of dependencies. I think it is absolutely the right choice to use that mechanism for that very purpose.
On the other hand, as discussed, PyInstaller apparently expects an output name for the binary executable. This one, I strongly believe, goes orthogonal to the dependency selection process, and should be conveyed to the "driver" on behalf of a separate variable.
In order to wrap it together, I would be fine if the "driver" synthesizes it from a single unique label/tag, by e.g. mapping it from cfr => cratedb-cfr-{version}.exe
, or such.
Just sharing my humble thoughts on this matter, maybe possible to consider, otherwise please »go ahead« ;].
With kind regards,
Andreas.
Status Update: GHCR is not suitable
It looks like GHCR is not suitable to host and distribute standalone artefacts of arbitrary nature. The means of what GHCR provides, is being a registry and provider for OCI images, which, for example in case of Homebrew, are apparently being unwrapped by the brew
installer program, in order to derive its "bottles" packages out of them again.
-- https://github.blog/2021-06-21-github-packages-container-registry-generally-available/
In that spirit, it is not suitable for our use case, and we need to find a different solution. Maybe JFrog, maybe just slap it onto our HTTP server, like we are doing it with the cratedb-prometheus-adapter, and also others like the standalone version of crash
? https://build.opensuse.org/ could also be an option, but it might be too much focused on building and matters of Linux, to be an adequate generic solution for distributing binaries of arbitrary nature.
We will continue on this topic next week, and will also be happy about any suggestions, when applicable.
A Build Matrix and Upload to GitHub Workflow Artifacts
Hi again. As a start, I've expanded @seut's patch (thanks!) by adding a corresponding GitHub Actions workflow recipe that defines a build matrix and invokes poe build-cfr
to build and publish relevant artifacts to GitHub Workflow Artifacts with 7fb9df3.
Example
Workflow: https://github.com/crate-workbench/cratedb-toolkit/actions/runs/9826830191
Backlog
Building upon this, we can think about other build- and publishing-destinations/-procedures/-cycles on behalf of subsequent iterations.
@hammerhead was quick to spot that the current procedure is not sufficiently sustainable yet, see crate/cratedb-guide#55 (comment):
What is the approach to keep these links to release bundles up-to-date? I noticed it currently links to a specific GitHub Action run. Is there a possibility to have the artifacts as part of the regular release assets (https://github.com/crate-workbench/cratedb-toolkit/releases), so updating it here is just a matter of keeping it in sync with the latest cratedb-toolkit version number?
Yes, we need to improve the release and publishing procedure. @seut and I discussed it already, but we did not want to block the current minimal implementation iteration because of other obligations.
NB: The maximum default retention time for GitHub Workflow Artifacts is 90 days. So, we need to improve this within the next three months. I think it is feasible.