facebook/prophet

Remove `pystan` backend, move to `cmdstanpy`

tcuongd opened this issue · 18 comments

Context

Pystan 3+ makes some breaking changes to Prophet python package:

  • MAP estimation is removed.
  • Windows support is removed.

We've been continuing to use Pystan 2.19.1.1, but this is only supported up to Python 3.8 and no longer receives development. Many of our issues also relate to pystan issues, which will become harder to debug over time and may need hacky patches.

The current installation process with pystan is as follows:

Mac OS / Linux

  • End users usually use prophet from PyPi.
  • End user must install pystan=2.19.1.1 first, as it is required for building the wheel
  • End user installs prophet, which includes compiling the pystan model on their machine.

Windows

  • End users usually use prophet from conda-forge.

Proposal

  • Use cmdstan + cmdstanpy to build binary distributions for Pypi (using Github Actions) and conda-forge, across all platforms.
    • Packaging pre-compiled stan models per platform + python version means the end-user does not need to perform the build process on their machine. They won't even need their own cmdstan installation since we will package the necessary executables into Prophet itself.
    • Only the lightweight wrapper library cmdstanpy is required as a dependency.
  • Remove the dependency on pystan altogether.

This should not change how end-users typically interact with the Prophet package. The general flow of

m = Prophet()
m.fit(df)
future = m.make_future_dataframe()
m.predict(future)

will remain unchanged, and all other diagnostic functions will work as normal.

It's just that behind the scenes, the model object will now be a cmdstanpy object rather than a pystan one. The argument stan_backend='PYSTAN' will also be deprecated. This will affect end users who retrieve the actual pystan object and work with it directly, but we expect this to be the minority.

We have already refactored the setup.py script to accomodate this new process (it currently supports both pystan 2.19.1.1 and cmdstanpy for backwards compatibility).
We have also built and tested PyPi wheels for macOS and Linux for Python versions 3.6-3.8. If you follow the instructions here, you can test these binary distributions on your machine.

Work required

For the next minor release (prophet 1.1.0), we should consider the following:

  • Update existing conda build scripts to use the cmdstan + cmdstanpy backend.
  • Remove the pystan model compilation from the build process and remove pystan as a Prophet dependency.
  • (Optional) Add support for a Windows wheel on PyPi.

excited to be the Python backend!

CmdStanPy pr stan-dev/cmdstanpy#461 changed behavoir of sampler so that CmdStanModel method sample argument show_progress defaults to True, previously default was False.

also, we're hoping that next release of CmdStanPy is going to be the 1.0 release.

FYI - CmdStanPy PR stan-dev/cmdstanpy#460 is removing deprecated functions.
I checked class IStanBackend - this won't affect it.

FYI, CmdStanPy 1.0.0rc1 is up. Methods, arguments, and properties which were previously deprecated have been removed. The addition of progress bars to the sample method and the cmdstan_install methods will change the console output behavoir; these progress bars use the tqdm library.

More information is available here: https://discourse.mc-stan.org/t/cmdstanpy-1-0-0rc1/24973

Thanks heaps @mitzimorris! Sorry I haven't had the time to take a look. Do you have any examples of other packages using cmdstanpy on conda btw? It'll save me some time -- if not, all good.

Would be keen to know when the 1.0 release goes on PyPI as well! Given I've been slow on this I think we might as well wait for that and refactor the setup script.

answer from Brian who put together the conda package:
stan-dev/cmdstanpy#503 (comment)

@tcuongd and @mitzimorris , FYI the 1.0 release of cmdstanpy is available on pypi

https://pypi.org/project/cmdstanpy/

related discussion: stan-dev/cmdstanpy#475

Are you looking to compile the .stan files as part of the conda build? Or just depend on cmdstanpy and actually compile the executables at install or run time?

I'm happy to help with this if I can provide value

@WardBrian , yes that would be the idea. Would love the help since this issue is blocking a new release to pypi

Packaging pre-compiled stan models per platform + python version means the end-user does not need to perform the build process on their machine. They won't even need their own cmdstan installation since we will package the necessary executables into Prophet itself.

I'm not sure that it would even need to be per-Python version, since there is no direct interface between those programs and the python environment, it's all done via subshells etc.

I think this is relatively simple to do via conda - cmdstan can just be a build-time requirement and then you can build the models and store them with your package. This can be done either with cmdstanpy and then saving the executable, or actually just running the cmdstan commands directly inside bld.bat/build.sh. The only 'issue' I forsee is if you plan on using cmdstanpy as a runtime requirement, conda will install cmdstan anyway, even if you don't technically need it.

Is there an existing PR to the feedstock to tackle this?

Thanks Brian! Ideally we'd compile the stan files as a part of the conda build. There's no PR to the feedstock yet, I haven't started on it.

Reading through the feedstock and setup.py, I think you may have a bit of a chicken-egg problem about when you actually remove pystan. The feedstock uses the pypi source tarball rather than something like a github tag, so it can only depend on a version which is already released to pypi.

I think it's possible to build a working cmdstanpy version based off 1.0.1 (and if you remove pip check from the tests you probably won't even need pystan), but then I think you'll want to fairly rapidly think about the next version where pystan is not the default and clean up some of the code in setup.py as it relates to cmdstan (I noticed there are already some TODO comments about this). If you remove PyStan completely I believe it makes the conda build much simpler as you'll no longer need all the run time packages in the build environment

I'll make a few attempts at updating the recipe today and see if I can get anywhere

I should note that I think the changes to setup.py made in #2010 will need to be re-thought or reverted. The assumptions it makes about the cmdstan path will not hold true in the conda environment (you cannot in general hard-code ~/.cmdstan as the path), and the extra work it does (rebuilding cmdstan, 'pruning' the cmdstan folder after the fact) will also be problematic I believe

I think the previous method of the very simple build_model method would work better. If you're concerned about previously not having cmdstan installed, you can call something like cmdstanpy.cmdstan_path, catch its exception and call cmdstan.install_cmdstan, and then proceed

Tests over at conda-forge/prophet-feedstock#21 are green, so as a proof of concept this works for 1.0.1

I think given the current state of master, this would break in the next release. The functionality in setup.py that moves/repackages CmdStan should be configurable (something like an environment flag REPKG_CMDSTAN would be great), such that disabling it would just assume you have a working cmdstanpy installation and uses that instead (like is done in 1.0.1's released version). The logic in the actual Backend class would need to check that the folder exists before setting cmdstan to that version, but that's also reasonable.

If you set it up in that way you could build wheels with it enabled, but it is disabled for conda or power users building from source. Plus, it should make testing with different CmdStan versions a lot easier if you allow the model to be arbitrarily rebuilt, so everybody wins in the end.

@tcuongd , let me know if there is anything I can do to help with that reorganization. I'm not as familiar with building wheels as I am with conda recipes, but the logic in setup.py should be pretty simple. If you'd like, we can also have the conda recipe target git tags from github to test before the release of the next version.

@WardBrian , I think that makes sense for setup.py. Please feel free to re-organize setup.py. The logic for a setup.py file is usually straightforward.

I like the environment flag to build the wheel by default, with an override if a power user wants to build from source.

I am currently waiting to hear back from my employer about whether the Facebook CLA is compatible with our IP policy.

A question that may be related to this change: what originated the use of a different model on windows versus unix? I believe CmdStan should be able to compile the 'unix' model on all platforms

I don't believe there was ever a compelling reason for a different Windows vs Unix model. Probably just general Windows difficulties. By pre-compiling the model vs having the user compile the model, all of the Windows issues with C++ compilers are sidestepped. If you look through the Prophet issues, there's loads of "can't install on Windows" issues related to model compilation.

If you are able to use the same model for all platforms, that would be the best way forward and easiest to maintain.

With #2088 now merged, I guess the next step would be getting rid of the pytstan dependency and imports? Is there much work to do there or is it just a matter of removing all pystan references in the code (and eventually documentation)?

In #2088 we discussed the next steps. Beyond just removing the PyStan references, there is also some refactoring work which can be done since the abstraction of a “backend” is not really needed any longer. I don’t think that would need to happen right away though, it’s probably fine to leave IStanBackend (and just only have one option for it) for the time being