uber/neuropod

Support for newer versions of Torchscript

tgaddair opened this issue · 14 comments

Hey @VivekPanyam, we're still big Neuropod users at Ludwig. We are interested in using the Torchscript backend, but seems 1.5.0 is still the most recent version with published wheels:

https://download.neuropod.ai/whl/stable.html

Is there a way to use a more recent version of Torchscript?

cc @brightsparc

Hey @tgaddair!

The method of installation using the link you mentioned is a bit out of date.

See the docs at https://neuropod.ai/docs/master/installing/

Basically pip install neuropod==0.3.0rc5 and then install a backend as described in the "backends" section of the link above. The short version of why backend installation is separate from pip install ... is so we can share backends across all languages you can use Neuropod from (e.g. C, C++, Java, Python, Go, etc.). This means you only need to install support for a framework once and it'll work from any language.

For the 0.3.0-rc5 release, we support up to TorchScript 1.9.0 (https://github.com/uber/neuropod/releases/tag/v0.3.0-rc5).

Neuropod's release cycle has been a little weird in that we've released a bunch of RCs since the initial public release, but haven't put out a new "stable" release (although the RCs are generally pretty stable). Hopefully we'll put out a stable release soon, but tl;dr running off the latest RC should be fine for now.

Let me know if you have any thoughts or questions! Also curious if you have any thoughts on a good release cadence. Maybe major releases with breaking changes at most quarterly and minor releases whenever?

Thanks @VivekPanyam, that worked!

I'm personally a big fan of minor releases whenever there's a bug fix or change to support a newer framework version, and major releases whenever a significant feature is added. Quarterly sounds pretty reasonable to me in general.

It does seem we're relying on some features that are only in Torchscript 1.10. Is it possible to add a 1.10 backend in the near future?

For sure! I went ahead and put up a PR to add support for Torch 1.10.2: #531

Once builds pass and it gets merged in, I'll make another RC release :)

Awesome, thank you very much!

Of course! I landed support for Torch 1.10.2 and I'll put out a new RC tomorrow (in case there are some other changes we land by then)

Sounds great!

Just a heads up, I'm going to wait until #533 lands before putting out an RC. Let me know if you'd rather I just make a release earlier

Released v0.3.0-rc6: https://github.com/uber/neuropod/releases/tag/v0.3.0-rc6

Installation instructions: https://neuropod.ai/docs/master/installing/

Let me know if you need anything else!

Thanks @VivekPanyam I've been able to test this on linux successfully, however I am unable to pip install with my osx-arm64 platform, do you plan on supporting this in the future?

Unfortunately no plans at the moment. No one has requested M1 support yet, but if it's important to your team, happy to chat with you and/or @tgaddair.

Adding support for another platform is a lot of work so I'd want to understand your usecase more. For example, if it's for testing or development, maybe there's a Docker or Rosetta solution that might work.

For context, a few things that make this complicated:

  • All our backends would need to support arm64 on Mac (e.g. libtensorflow, libtorch, isolated python interpreters + all transitive deps)
  • There aren't arm64 Mac prebuilts of libtorch or libtensorflow. We could build them from source, but unfortunately, based on GH issues, M1 builds from source seem a bit finicky for both Torch and TF. Also, building 5 versions of libtensorflow and 8 versions of libtorch from source is going to be fairly time consuming.
  • Support for old versions of frameworks (e.g. TF 1.15) may be tricky because they were released before M1 devices were available. This means building from source may not help us sidestep the "no prebuilts" issue.
  • GitHub Actions doesn't provide M1 runners so we'd need an alternate solution to testing in CI

There are a few more issues that come to mind, but those are some of the big ones. All of them are solvable, but at the moment, I'm not aware of a compelling usecase that warrants the effort (vs a simpler alternate solution like Rosetta or Docker).

So far, we haven't had a "prod" Mac usecase (i.e. where perf is important), but please let me know if you have one!

(Context: Neuropod initially started as Linux-only (because at ATG, we'd develop on Linux and our target platform was also Linux). For wider adoption (at Uber proper), we added Mac support because people would develop on Mac and run in prod on Linux.)

Thanks for the detailed explanation makes sense. Noted the Mac distribution but realize M1 support is non-trivial. We have as an option in Ludwig which is often run locally for development so will provide relevant instructions to work with Rosetta

Hey @VivekPanyam, we've been using the new RC with PyTorch 1.10 support. Works great! What do you think about cutting a new release? Seems everything is pretty stable at the moment.

(We were talking about v0.3.0-rc6 in the thread above so I assume you're referring to cutting a major release.)

Sorry about the delay in responding. I wanted to get the blockers for a major release on paper before responding but that took longer than expected.

See #555 for details.

The short version is (1) getting to a point where we've decided on the ABI compatibility guarantees we want to make and (2) have checks in CI to ensure we don't accidentally break ABI compatibility.

Maybe a good intermediate step is to just cut a v0.3.0 release. The things that need to happen for that are release notes and updated docs so it should be a little more straightforward.

(Tangentially realated: I released v0.3.0-rc7 to GH and PyPi over the last week if that's helpful to you)