pyodide/micropip

My attempt to use index_urls for installing a scikit-learn wheel from pypi.anaconda.org

Closed this issue · 4 comments

Why I want to do this

It would be nice to be able to build a stable and dev wheel for scikit-learn and make it easily installable inside Pyodide. This would allow to control the version of scikit-learn that the user uses, independently of the scikit-learn version provided by the Pyodide distribution.

pypi.anaconda.org seems like a good place since scikit-learn (and other scientific packages) are already using it for some of scikit-learn, for example for hosting nightly scikit-learn wheels see https://anaconda.org/scientific-python-nightly-wheels.

This was mentioned in other places e.g. pyodide/pyodide#3049 (comment).

What I tried

I uploaded a scikit-learn wheel to https://pypi.anaconda.org/lesteve/simple/scikit-learn/

I was hoping to be able to use it with index_urls (at the same time I expected CORS issues to come into play) in the Pyodide latest REPL:

import micropip
micropip.install('scikit-learn', index_urls='https://pypi.anaconda.org/lesteve/simple')

Looking at my browser console, I was surprised to realise that the scikit-learn wheel is coming from
https://cdn.jsdelivr.net/pyodide/dev/full/ rather than the index_urls I specified. In the use case I outlined at the beginning, I would like that index_urls takes precedence over what Pyodide provides.

Maybe this is expected since index_urls is only supposed to work with Python wheels and not compiled wheels? Or maybe packages provided by Pyodide have a special status for micropip?

Full disclosure: I then tried to build locally a Pyodide dist without scikit-learn and indeed in this case index_urls is used but (as kind of expected) there is a CORS issue.

Oh, thanks for the report. For now, what happens when you do micropip.install("pkg") is,

  1. lookup repodata.json (pyodide-lock.json) to check if we have the pkg in the Pyodide distribution.
  2. If we have it, load from Pyodide distribution.
  3. If we don't have it, fallback to index URLs.

This was done because the only package repository micropip used before was PyPI, and PyPI can't host WASM wheel, but now that it's possible to use a different package repository by changing the index URL, it makes sense to add an option to turn it (lookup pyodide-lock.json first) on/off.

Maybe we need to add something like micropip.install(pkg, lookup_index_urls_first=True) to change the search order: index_urls first, then pyodide-lock.json.

OK makes sense. Being able to change the priority order would be great for my use case.

My naive expecation would be that if I provide index_urls I probably want it to be used first, so maybe by default have lookup_index_urls_first=True? It may still be useful to add the lookup_index_urls_first parameter to be able to have flexibility on the priority order.

rth commented

I provide index_urls I probably want it to be used first

Yes, I would agree with that. Can't we make this the default @ryanking13 ? Even if the added index only has non-WASM wheels IMO it should still be the highest priority. So the priority would be,

  • Customindex added
  • pyodide-lock.json
  • pypi
    Though yes, then it's a bit annoying if we define pypi and the custom index via the same API to indicate that they should have different priority

@rth Yes, that sounds more natural to me. It's a bit annoying but it's the behavior users expect.