florisvb/PyNumDiff

Duplicate package?

Opened this issue · 6 comments

Hey, I've been working on pysindy and realized that the Kutz/Brunton lab is associated with two packages for "numerical differentiation of noisy time-series data", citing the same Chartrand paper. Currently, pysindy utilizes the derivative, which I've been contributing to occasionally over the past year. I wanted to ask what the authors (@florisvb @luckystarufo, @andgoldschmidt ) opinions are as to whether it's worth combining the packages, especially since these were both projects supervised under the same advisors? From the perspective of the python ecosystem, I think it would be better; less maintenance cost, and zen number 13

"There should be one-- and preferably only one --obvious way to do it."

Obviously everyone's welcome to their own packages, however.

At first blush, here's the comparison:

  • pysindy depends upon derivative
  • 163 gihtub repos have a pyproject.toml|setup.py|requirements.txt for derivative (excluding self) vs 0 for pynumdiff (cursory search). EDIT: found motiontrackerbeta
  • pynumdiff has 84 stars whereas derivative has 37.
  • pynumdiff has a JOSS paper.
  • derivative has more smoothing/derivative methods than pynumdiff (e.g. spectral, spline...)
  • pynumdiff appears to have more advanced/complex implementations of TV and Kalman.
  • pynumdiff has more fragile dependencies to satisfy (e.g. numpy vs cvxpy and pychebfun, which hasn't been updated since 2017)
  • derivative has around 650 daily downloads, whereas pynumdiff only has 15-20 (many of these are likely because derivative is required by pysindy.)
  • derivative is older by about a year (2019?)

I'll be working in the pysindy & differentiation over the next (and hopefully last) year of my PhD and contributing code in a variety of repos. Obviously, both packages have MIT licenses. But I want to poll the authors' feelings on merging code and, long-term, deprecating one of the packages, especially if I begin to add issues like "hey I'm trying to copy your code, why does foo() call bar()?".

Floris, thanks for that excellent paper and context! It does look like your implementations and algorithm coverage is as good/better than the ones at derivative.

For my PhD research, I proposed combining the derivative/smoothing optimization problem into the SINDy optimization problem as a single multi-objective optimization (sort of along the lines of the recommended work in your conclusion!). I started down that track by doing a lot of refactoring to improve how the derivatives are used in pysindy. When I began running experiments to compare different derivative methods in SINDy, Nathan suggested that I google other packages for the total variational derivative, as the results using pysindy/derivative's TV method weren't competitive with Kalman/Savitsky-Golay. Hence my surprise at finding that Nathan's also an author on the PyNumDiff paper.

Also, deprecation isn't really an official thing in python packages. As long as nobody specifically yanks releases, they remain on PyPI in perpetuity. Thus, anything that depends upon pynumdiff==0.5.3 would continue working in any deprecation scenario. Deprecation is just manually adding a note at the top of the README.md "This package has been deprecated. Continued support is maintained by this other package (GH link)"

I've never merged packages before, so I'm taking a stab, but here's my best guess:

  1. I update the pydindy.differntiation API in my PR over there (currently, any callable that returns derivatives works, but the issue being how to also return the smoothed coordinate values).
  2. I migrate the derivative wrapper from pysindy back to derivative and add similar wrappers here in PRs.
  3. I do some housekeeping PRs here to ensure dependency compatability with derivative (currently everything here is pinned. Also, looks like you're running Travis CI, but I don't see a link? Would you be OK switching to GitHub actions CI?)
  4. In the successor package, pull request changes that (a) import the other package (b) exposes everything imported from other package at the top package namespace (there don't appear to be name conflicts), and (c) copies all the tests from the other package, verifying they run.
  5. Build derivative-style and pynumdiff-style wrappers for each smoothing/method
  6. Copy the rest of the other package into the successor package, updating from pynumdiff import x to from .pynumdiff import x or from derivative import x to from .derivative import x. verify tests pass.
  7. Evaluate competing implementations with a benchmark test and update wrappers to point to better implementation, removing the lesser implementation.
  8. Mark the other package as deprecated in readme.md
  • caveat: I don't know how adding new methods works with the pareto optimization of in your paper (pynumdiff.optimize) to new methods, e.g. the Kernel smoothing that some folks in the Kutz/Brunton orbit are doing that I'm trying to add to derivative. That seems a neat & useful tuning method.

At a certain point, we'd have to decide, if a single package is the desire, which should be the successor package and which should be the deprecated pacakge, and all the admin that entails. We could also go for a system where derivative imports pynumdiff and we stop short of copying the code. Really, numbers 1,2, & 3 are all that I'd need to do for my research; the rest is more about improving the package ecosystem around SINDy and numerical differentiation.

-Jake

I agree on getting Nathan/Steve's input. I'm going to give a talk at the Brunton/Kutz group on May 8th around a variety of pysindy changes, including a discussion on what aspects of the API would be breaking changes. In the interim I'll try to PR some of item 3 above (CI and dependencies).

-Jake

Awesome, thanks Floris! Next step I'm working on is to refactor the package metadata and requirements into a pyproject.toml. setup.py was deprecated a few years ago. The replacement method also allows you to set version information in release via git tag, so all package uploads to pypi automatically contain correct version number without any manual step, and any in-between versions show git metadata in the version string.