google/xls

Consider creating binary artifacts via CI

cdleary opened this issue · 9 comments

Right now we require users to build XLS from source, which can take a long time because of some of our larger dependencies 1 (e.g. LLVM, Z3) and requires users to set up Bazel / a build environment.

For comparison, in JAX we implicitly publish XLA as a shared library by way of Python extensions that can be pip installed. If we wanted users to easily obtain XLS in Python (via our pybind11 bindings) we'd need to do similar and have extension-module creation flows. (Our pybind11 XLS bindings are notably exposing a fairly rich API surface area with ownership and complex types being passed and such.)

However, the open question is how to make sure C/C++ embeddings that can work nicely too that'll fit naturally in the XLS development process.

If we can determine an appropriate set of "public" C++ APIs we should be able to have CI create a mostly-statically-linked Dynamic Shared Library (DSO) libxls.so and publish that with associated headers. There would be no stability would be guaranteed (API or ABI, across any time window). However, publishing any exposed C++ API brings considerations around libstdc++ vs libc++ usage (and versions of those) for C++ types in the exposed APIs! Also, this still doesn't enable C embeddings, or the many other language environments that are happy to try to interop with C-API-exposed functionality.

Off the cuff a conceptually nice approach would be to (1) make the flat C API to expose from a DSO (2) use CFFI to build our Python bindings from those (3) create a C++ wrapper "client" library around (1) that's compile-time-polymorphic between the DSO being there and not being there (latter being used for source builds). But a) we already have pybind11 working nicely and b) what I'm describing is a good chunk of work and c) generally restraining ourselves to expose all functionality via a flat C API object model feels like a bump in our typical development process 2 ... for a future embedding scenario that's neither Python nor C++.

But it could be enabling! There are always cool use cases that are impossible to anticipate.

I'd say priority-judgement-call-wise I'd suggest we maybe can decide on the public API boundary, build a libstdc++ and libc++ artifact in CI, and call that a day until we have more understanding of the upside for a simpler (C) embedding.

[1]: Even if we had a great solution for shared artifact caching (e.g. via RBE) it would still be useful for folks to not have to set up and use Bazel to try XLS; e.g. in some embedded context.

[2]: Forcing all the thinking into how to make a flat API (vs arbitrary C++ object model) can potentially slow things down vs just working in the C++ object model directly. But also flat APIs usually encourage nice design thinking on composition and orthogonality. On the flip side, for complex embeddings working directly with the richer C++ object model can be nice instead of depending on things to be explicitly exposed through the flat API. War story: in XLA we had the "client" boundary as a natural cut point, where things could either delegate via C++ objects or via protobuf RPCs, and since it was proto oriented it could even do things stably and via inter-process-communication, yet sophisticated embeddings didn't usually want to use the more limited API "cut" because they wanted "all the possible power". Also in XLS we don't have that mostly-stable sort of separation yet.

Another consideration, aside from a general DSO embedding artifact, is that we could build all of our tools/utilities (e.g. in the tools/ subdirectory) and as referenced in the tools quickstart guide so those could be easily obtained without going through the whole build process.

We should also be sure to version these released artifacts, even if it's just the built set of tools, it lets us signify when major feature improvements / backwards incompatibilities occur (by conforming to semantic versioning). Version being worked on (if at HEAD) should be displayed prominently in readme/docs.

It'd be neat to have executable artifacts built and available for download, even if the shared object library stuff hasn't been sorted out yet.

An update on this:

  1. Currently we have mechanisms for packaging the binary releases, but the actual release is pending going through an internal license review.
  2. XLS is currently distributed as an open-source source project, but that does not prevent third parties from building the source and caching the result for their use.

Any update?

Come to think of it, I think the Rust project uses a pre-compiled LLVM dependency and compiles everything else. If LLVM is the bottleneck here, have you considered doing the same?

We do have conda packages maintained by the https://github.com/hdl/conda-eda community here https://anaconda.org/LiteX-Hub/xls/ and there are some plans to provide a conda-forge equivalent as well (see hdl/conda-eda#193).

Some of the pybind11 bindings got removed in d27e24b, and I think the new plan or record to to go back to the approach described in #108 (comment) with a stricter C/API exposed thru CFFI (filed #1256 to track/discuss this effort).

/cc @ericastor @cdleary

Note there's active progress on this now, e.g. e5f3d0d is a prelude to a github action that can host release binaries. These will be downloaded/used as canonical URLs by e.g. the colab runtime linked from our README.