anibali/docker-pytorch

Is CUDA being downloaded twice?

Closed this issue · 4 comments

I'm checking out the Dockerfile here: https://github.com/anibali/docker-pytorch/blob/master/dockerfiles/1.8.1-cuda11.1-ubuntu20.04/Dockerfile, and noticed that it uses:

  1. A base image including CUDA (nvidia/cuda:11.1.1-base-ubuntu20.04)
  2. An installation of PyTorch pre-packaged with CUDA (pytorch=1.8.1=py3.8_cuda11.1_cudnn8.0.5_0)

Is my understanding right, or do the CUDA resources in the base image differ from those bundled with installing torch, such that the resources aren't redundant?

I'm pretty sure that it is mostly redundant, but I can't see a way around it since PyTorch requires cudatoolkit. I have dodged doubling up on CUDNN by not using the CUDNN base image, but I'm pretty sure that having a CUDA base image is required for proper handling of the GPU (i.e. I don't think that a stock Ubuntu base image would work). If you find a way of avoiding the redundancy please let me know.

It seems like the only way to avoid this is to build from source?

@jbohnslav Right, that would work but would also require some fairly major changes to this project. I'll probably look into it at some point.

The NVIDIA CUDA base images are actually very small (e.g. https://hub.docker.com/layers/nvidia/cuda/11.3.1-base-ubuntu20.04/images/sha256-5a8e4366bd66c2734183c8b2e58da108c2e74f9edb6005be7c828786132fef5a), so I'm going to close this issue until someone has a good, concrete suggestion. Building PyTorch from source would complicate things considerably and I'm concerned that there would be no real gain and potentially compatibility issues.