pangeo-data/pangeo-stacks

adding GPU-enabled tensorflow and pytorch

Closed this issue · 5 comments

We would like to do deep learning on GPU nodes in google cloud. (I know, just use collaboratory, right?)

There is this blog post from anaconda which describes the performance benefits of using the conda build of tensorflow. They also provide a tensorflow-gpu package.

However, none of this is available on conda forge. There is a long issue about why it is hard / impossible to build gpu-enabled packages on conda-forge.

So what should we do? Is it feasible to switch our whole notebook image to defaults rather than conda-forge?

One possible way out is to have a second conda environment that is set up with defaults.

@jsadler2 and I are interested in putting this image together. I think this is as simple as adding a new image and basing it off defaults rather than conda forge. I'm guessing this will take some trial and error but the steps are probably something similar to:

  1. copy base-notebook directory to ml-notebook
  2. edit ml-notebook/binder/environment.yml to include the libraries we think we'll need (probably similar to base-notebook+pangeo-notebook but using the conda defaults channel. We'll also add the gpu versions of tensorflow and pytorch.
  3. commit these files to a new branch and open a pull request.

I think the trial and error will come in getting the right environment file.

As of December 2019, tensorflow and pytorch gpu packages are available in both the default anaconda channel as well as conda forge. So you can just replace tensorflow with tensorflow-gpu. However, there are a number of other changes that need to be made in order to get GPU utilization to work (including, most obviously, defining a GPU worker pool).

I have some working examples in my forks:
Pangeo-ML (setup)
Pangeo-ML stacks
Pangeo-ML helm chart

As of December 2019, tensorflow and pytorch gpu packages are available in both the default anaconda channel as well as conda forge.

This is not the case, there are no GPU packages in conda-forge.

Yeah, my mistake, I didn't realize that anaconda still automatically pulls packages from the anaconda channel when you add conda-forge as a channel during install.

But regardless, you can replace tensorflow with tensorflow-gpu and pytorch with pytorch-gpu and conda will pull these packages from the default repository. It worked just fine in my setup.