Build a deep learning workstation from scratch. While this document is written for Ubuntu 14.04 with TensorFlow, most steps should also apply to other Ubuntu versions and deep leanring frameworks.
Have fun DIY-ing!
Disclaimer: This document records my own experience and lessons learnt in building a workstation for deep learning. However, there is no gurantee on saftey or success of the construction. It's your own responsibility to maintain safety during the process. Please refer to professional IT service when you have question or meet trouble.
Feel free to skip this section if you already have a GPU machine or plan to buy a pre-assembled one.
In general, you need to have motherboard, CPU (including CPU fan), memory cards, power supply unit (PSU), hard drive (+SSD), case and graphics cards (GPUs). A good place to pick and compare computer parts is PC Part Picker. The website will help you check compatibility of components and provides price comparisons from multiple providers, which is really helpful.
You can find my part list HERE. The parts I bought are not completely the same as the list because, for convenience, I just bought all parts from Frys and choose parts depending on their availability. But at least you could get a sense of how much it costs and what parts are necessary. For my case it costs around $2.2k (May 2017) in total for a machine with two GTX 1080 GPUs, 32GB memory, 500GB SSD + 4TB hard drive with a quad-core i5 CPU and 850W power. Note that this spec is optimized for cost with strong constraint to preserve two GPUs. CPU, SSD and memory are compromised.
Read the instructions of motherboard, power and case carefully before installation. There are also plenty of videos online on how to assemble a machine, which might be helpful if you are doing it the first time.
A brief guideline on the assembling steps are as follows.
- Install CPU on motherboard, install CPU fan, make sure the CPU fan stays solid, connect fan power cable to motherboard
- Install memory card to motherboard
- Connect cables to PSU, motherboard power (2 of them one for CPU, one for motherboard general, read the text on the cable, don’t misplace the PCI power cables!), harddrive power (one for HD, one for SSD), GPU power (2 cables), put PSU into the case
- Install motherboard into the case, put the IO shield (the one for USB, ethernet etc.) before putting the motherboard
- Connect power, reset, LED etc. jump cables to motherboard, connect case fan power cables to motherboard, connect motherboard power cables
- Install HD and SSD, connect power cables
- Install GPUs, connect power cables
- Connect case cable to plug-in, start machine, enter the BIOS window to check if all HWs are detected and work normally :)
Tip: if you find it extremely difficult to connect some cable, you are probably doing it in the wrong way!
I assume at this step, you already have a functional machine connected to display, keyboard and mouse.
Assuming you or your friend already have a computer, then you can prepare a USB stick for OS installation following this guideline: Create A USB Stick on Ubuntu or equivalents for Windows and macOS.
The 14.04 ISO file (ubuntu-14.04.5-desktop-amd64.iso ) can be found here: http://releases.ubuntu.com/14.04/
Insert the disk to the machine's USB stick. Start the machine and it should automatically enter a window for Ubuntu installation. If your system has a preinstalled OS, you need to modify BIOS boot order to set USB stick as first priority. The installation process should be very fast (less than 10 minutes for my case) and simple.
This and the following steps require Internet connection.
Install some useful packages in terminal:
sudo apt-get update
sudo apt-get install \
aptitude \
freeglut3-dev \
g++-4.8 \
gcc-4.8 \
libglu1-mesa-dev \
libx11-dev \
libxi-dev \
libxmu-dev \
nvidia-modprobe \
python-dev \
python-pip \
python-virtualenv \
vim
Download CUDA installation file: https://developer.nvidia.com/cuda-downloads
Choose Linux -> x86_64 -> Ubuntu -> 14.04 -> deb (local) -> Download
Install CUDA in terminal (use the specific .deb file you've downloaded):
cd ~/Downloads
sudo dpkg -i cuda-repo-ubuntu1404-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
Restart the computer to activate CUDA driver. Now your screen resolution should be automatically changed to highest resolution for the display!
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is aGPU-accelerated library of primitives for deep neural networks with optimizations for convolutions etc.
Register an (free) acount on NVIDIA website and login to download the latest cuDNN library: https://developer.nvidia.com/cudnn
Choose the specific version of cuDNN (denpending on support of your prefered deep learning framework)
Choose Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0
-> cuDNN v5.1 Library for Linux
Install cuDNN (by copying files :) in terminal:
cd ~/Downloads
tar xvf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
Add the following lines to your ~/.bashrc
file (you can open it by gedit ~/.bashrc
in terminal)
export PATH=/usr/local/cuda/bin:$PATH
export MANPATH=/usr/local/cuda/man:$MANPATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
source ~/.bashrc
To check the installation, print some GPU and driver information by:
nvidia-smi
nvcc --version
Follow TensorFlow official page for installation: https://www.tensorflow.org/install/ Or install whatever deep learning frameworks that you prefer :)
For example to install TF1.1 with GPU and Python 2.7:
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL
I used SSD as my boot dist and followed THIS document to mount the hard drive.
To check the disk status:
sudo fdisk -l /dev/sdb
To create a partition start GNU parted as follows (assuming the hard drive is /dev/sdb):
sudo parted /dev/sdb
Now inside command line interface of the parted tool (set unit to TB and set a partition from 0TB to 4TB, then use print
to check partition and use quit
to save & quit the parted tool):
(parted) unit TB
(parted) mkpart primary 0 4
(parted) print
(parted) quit
Then use mkfs.ext4 command to format the file system, enter:
sudo mkfs.ext4 /dev/sdb1
Type the following commands to mount /dev/sdb1, enter:
sudo mkdir /data
sudo mount /dev/sdb1 /data
You can use df -H
to check current disk info.
To kee the mount after reboot, add the mouting setup to /etc/fstab
:
sudo vim /etc/fstab
Add the following line at the end:
/dev/sdb1 /data ext4 defaults 0 1
Server side, install SSH server:
sudo apt-get install openssh-server
Edit SSH configuration to whitelist users:
sudo vim /etc/ssh/sshd_config
Change root login permission line to: PermitRootLogin no
Add allow users: AllowUsers your_username
Then restart SSH server:
sudo /etc/init.d/ssh restart
sudo ufw allow 22
Client side, to connect with the workstation, you need to firstly know the server's IP (or hostname if it has one). Use ifconfig -a
on the server to check IP address (look for that in eth0
).
Client side (Mac OS), you need to whitelist the server IP in /etc/hosts
:
sudo vim /etc/hosts
Add line
<server IP> <server hostname>