flatironinstitute/CaImAn

Add dependencies to `pyproject.toml`

Closed this issue · 7 comments

Hello!

I see a previous issue on packaging re: adding to pypi - #509 - but this is sufficiently distinct, and the python packaging world has changed dramatically since 2019.

Currently the pyproject.toml file doesn't specify any dependencies and they are just listed in requirements.txt

Moving the requirements.txt to pyproject.toml would be essentially a no-effort change that allows the package to be installed via pip/any standards-compliant package manager, and if there are differences in the gpu requirements, those could be added as a extra or optional dependencies.

Doing that should also allow you to make a pypi-ready package for free, even if you just put source distributions up there and don't want to worry about building wheels for various platforms and versions, but most importantly it allows people to build downstream tools on top of CaImAn. Currently that's impossible unless you add all the requirements to downstream packages manually, and then that package is on the hook with maintaining a parallel set of deps. Similarly, not being on pypi prevents people building downstream tools except via conda, which is not really ideal for making a generalizable python tool.

Should be no additional maintenance burden (and in fact a lesser one where you can single-source the deps in a format that all python package tooling will support for the indefinite future).

I can PR if y'all are strapped for time, trying it locally and it works exactly as expected. :)

Right now we're mostly only supporting conda-forge based installs, although I see little harm in doing this. We mostly provide requirements.txt for people either building from source (which we mostly discourage) or for people running in colab.

My understanding, based on the last time I read about these packaging changes, is that pyproject actually can work fine alongside requirements.txt - one doesn't need to pick one or the other and the tooling will generally use both if present, correctly. Is that no longer the case? I think we probably don't need to focus as much on legacy compatibility though, so this is probably a good thing (even though I'm not sure how high-priority it should be given that it's unclear to me if it actually affects anything).

I'll put this on the todo list.

building from source (which we mostly discourage)

totally understand not wanting to take on another packaging modality due to maintenance burden, but hear me out! If you didn't want to upload to pypi, just adding the deps to pyproject.toml still has benefits without drawbacks, and if you did want to upload to pypi, doing so would also allow you to build working wheels so people wouldn't need to install from source :)

pyproject actually can work fine alongside requirements.txt

Typically when people use both, they put the "abstract" package specifications (either just the bare package name or package with some generous minimum bound) in the pyproject.toml and then will use requirements.txt for exact version pins. The python packaging ecosystem has evolved where that's mostly unnecessary tho, where multiple packaging tools like pdm and poetry will automatically create lockfiles that serve the purpose of strict environment pinning better.

the tooling will generally use both if present

most don't or require additional config - hatch needs a plugin to read requirements.txt, setuptools can only handle one or the other s.t. one sets tools.setup.tools.dynamic.dependencies = {file = "requirements.txt"} and ensures that it's present in an sdist (although this defeats the purpose & benefits of static dep specification), pdm can import/export requirements.txt but not use it for development/release. A project.dependencies table in a pyproject.toml file is now the standard :).

(even though I'm not sure how high-priority it should be given that it's unclear to me if it actually affects anything).

basically not having deps specified in pyproject.toml makes it impossible to build off of CaImAn. Even without being on PyPI, if the project had its deps specified it would be possible for us to add it as a dependency as a git repo. Without that, the only thing we really can do is copy/paste, add the GPL-2 license in our repo, and add a note that we have literal-included part of caiman -- which is both awkward and also deprives y'all of the much due credit from dependency graphs.

If I build a sdist/wheel of caiman right now, the dependencies aren't included (i.e. not read from the requirements.txt), and i (predictably) can't import the package due to missing deps.

There aren't any deps in the conda environment that actually require conda - i.e. with the exception of changing "opencv" to "opencv-python" and adding a requires-python field, if i copy and paste the conda environment into project.dependencies then i am able to install with pip no problem. the days of opencv and tensorflow being an awful pain to install are (mostly) over :).

Then, having done that, i'm able to build a wheel and an sdist, and with one further command i would be able to upload the package to pypi (but i won't, because i don't want to take the name ;)

Side benefit: managing deps using pyproject.toml would also allow you to make and maintain optional dependency groups so that caiman's runtime deps could be much lighter. Currently by default when installing the resulting venv is 2.6GB, which is a lot! If you used pyproject.toml, you could create optional groups for docs, tests, plotting, etc. where if someone wanted to just use some of the numerical routines they could avoid installing jupyter with its massive dep tree or deps that are only required in development (nose, yapf, mypy) or for plotting, etc.). You could single-source deps from your conda env where you just require . and optionally any other conda-specific deps you want - best of both worlds!


Anyway, tl;dr copy/paste 30 lines and you make your package available to the non-conda python world!

would be happy to PR

Fair enough; this stuff seems to have changed a lot over the last few years, and the guidance hasn't always been clear. I see no harm in the change. I'd accept a PR, but I'm also happy to do it myself; whatever you prefer on that front.

I think I'd probably want to avoid optional dependency groups, in order to keep it simple and also to make it clear to our direct users that we still prefer they use conda (at least, provided we don't need to revisit that decision - recently there have been some headaches on that front relating to packaging issues for tensorflow and pytorch)

(we certainly don't want to enourage vendoring - we've suffered when we've done that ourselves, for much smaller dependencies)

Whatever your decision, thanks for bringing it to our attention.

if someone wanted to just use some of the numerical routines

Just thought I'd pop in quickly to mention that this is easier said than done, #1391 does some of this.

pgunn commented

Resolved with #1432. We may at some point add or change the optional install categories, but for now I think this should do.

We're currently blocked on release because of some upstream packaging issues; it may take some time for this to make it into the main version, but it will.

Heck yeah, thanks y'all. I'll let my colleagues know :)

pgunn commented

@sneakers-the-rat Just keep in mind that it's still just in the dev branch.