Here are the steps to build a singularity container for keras with tf-gpu on a local computer for use on a cluster.
- Install virtualbox, use v5.1 or older versions, vagrant 2.0.0 is not compatible with v5.2.
- Install vagrant 2.0.0.
- Install vagrant manager.
- Install vagrant unbuntu vm 14.04.
mkdir ~/ubuntu-vm cd ~/ubuntu-vm vagrant init ubuntu/trusty64 --box-version 14.04
- Enter the ubuntu virtual machine.
cd ~/ubuntu-vm vagrant up vagrant ssh
- Inside ubuntu vm, install singularity 2.3.1
VERSION=2.3.1 wget https://github.com/singularityware/singularity/releases/download/$VERSION/singularity-$VERSION.tar.gz tar xvf singularity-$VERSION.tar.gz cd singularity-$VERSION ./configure --prefix=/usr/local make sudo make install
-
Check NVIDIA driver version on the cluster, download the same version driver from NVIDIA website in the vm.
nvidia-smi
For me, I got:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.59 Driver Version: 384.59 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN X (Pascal) On | 00000000:02:00.0 Off | N/A | | 39% 67C P2 113W / 250W | 11636MiB / 12189MiB | 35% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) On | 00000000:03:00.0 Off | N/A | | 23% 33C P8 16W / 250W | 2MiB / 12189MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN X (Pascal) On | 00000000:82:00.0 Off | N/A | | 33% 57C P2 63W / 250W | 11636MiB / 12189MiB | 20% Default | +-------------------------------+----------------------+----------------------+ | 3 TITAN X (Pascal) On | 00000000:83:00.0 Off | N/A | | 31% 54C P2 60W / 250W | 11636MiB / 12189MiB | 21% Default | +-------------------------------+----------------------+----------------------+
So I downloaded NVIDIA-Linux-x86_64-384.59. It is a bit tricky to find it.
But it should be http://us.download.nvidia.com/XFree86/Linux-x86_64/$VERSION/NVIDIA-Linux-x86_64-$VERSION.run -
Check cuda version on the cluster, download the corresponding cuda driver from NVIDIA website in the vm.
nvcc --version
For me, I got:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61
So I downloaded cuda_8.0.61 from https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run. I found it to be OK that if I'm using a run file but the nvidia version of cuda is different from that on the cluster (375.26 != 384.59).
-
Store the downloaded files and above scripts under the same folder in the vm.
-
Run
sh build.sh
. -
Copy the resulting
tensorflow_gpu-1.1.0-cp27-linux_x86_64.img
onto the cluster. vagrant scp can be used to copy files in vm outside. -
Running the container on cluster
singularity shell --nv tensorflow_gpu-1.1.0-cp27-linux_x86_64.img
- This is adapted from https://github.com/jdongca2003/Tensorflow-singularity-container-with-GPU-support.
- This assumes that the nvidia driver is installed on the cluster.
- When building the ubuntu-vm, give it at least 2GB memory. Otherwise, the installation of some python libraries may fail.