pyodide/pyodide-lock

RFC 2: partial lockfiles

rth opened this issue · 3 comments

rth commented

As proposed by @bollwyvl #4 (comment)

The way the in-flight jupyterlite PR works is by:

  • generating partial lockfiles
    • these wouldn't be sufficient to run anything
  • layering them, by name, on top of the as-loaded runtime with whatever index is hosted with it
    • this could be hoisted/normalized in pyodide's initializer with e.g. extraLockUrls: []
      • this could be non-destructive, which might be better, and potentially support "punch-out" by setting a name to a null (ick!)

The concrete things this solves there:

  • auto-install-on-import
  • replacing packages in the pyodide stdlib
    • IPython has an optional dependency on jedi/parso (but still downloads them)
      • we haven't been able to make jedi work well enough in jupyterlite to use, so would rather replace it with dummy shims to save a couple megabytes on the wire
    • per-PR docs builds of packages that are in the pyodide stdlib and really just want one different

Alternative proposals: #8 #10

rth commented

Thanks for the proposal @bollwyvl ! From JupyterLite side I can see why one could be tempted to do something like this, minimally interacting with upstream pyodide-lock.json and having something that works reasonably fast.

On one side I agree that no one wants to create yet another package manager, on the other I'm concerned that while such a solution should be easy to start with, it might be difficult to maintain or debug over time. Generating a lock file for packages is fairly standard task, and there existing tools we could re-use. While with this approach is fairly new packaging concept and so any issues would be ours. Unless you are aware of any packaging project that does similar things?

With #7 done, to be able to move further to something usable outside pyodide-build, would probably be to have a public API that can generate a single PackageSpec from an at-rest .whl, not yet on PyPI, already in a pyodide-lock.json, or otherwise uncharacterized.

Some thoughts on API design:

  • only accept Path
  • expose as...
    • pyodide_lock.spec_from_wheel as a standalone function
    • pyodide_lock.PackageSpec.from_wheel as a class method
    • both, with the implementation in a separate file from the spec itself

Out of scope:

  • downloading .whl would incur a lot more complexity
  • generating full pyodide-lock.json
  • CLI

I've got a working, but potentially out-of-date, strawman but I'm not precious about it... if there's another one in pyodide-build, that's fine, too.

The strawman implementation uses pkginfo (MIT, 30k .whl, no deps) and packaging (BSD, 50k .whl, no deps. These could be in an extra dependency, lazily imported inside the implementation, as they would not be required to validate... but then pydantic is at least 150kb, so maybe who's counting (and [extras] are... kinda broken).

Added the above as a draft on #18... probably won't have any more time to respond this weekend, but had this kinda working in #17 before descoping, so figured I'd push it up so it could be discussed later.