This repository contains homework assignments for the Kadenze Academy course on Creative Applications of Deep Learning w/ Tensorflow.
Session | Description | |
---|---|---|
Installation | Installation | Setting up Python/Notebook and necessary Libraries. |
Preliminaries | Preliminaries with Python | Basics of working with Python and images. |
1 | Creating a Dataset/Computing with Tensorflow | Working with a small dataset of images. Dataset preprocessing. Tensorflow basics. Sorting/organizing a dataset. |
2 | TBA | TBA. |
3 | TBA | TBA. |
4 | TBA | TBA. |
5 | TBA | TBA. |
- What is Notebook?
- Docker Toolbox
- Jupyter Notebook
- Navigating to Notebook
- Installing Python Packages
- CUDA/GPU instructions
- Testing it
- Troubleshooting
We will be using Jupyter Notebook. This will be necessary for submitting the homeworks and interacting with the guided session notebooks I will provide for each assignment. Follow along this guide and we'll see how to obtain all of the necessary libraries that we'll be using. By the end of this, you'll have installed Jupyter Notebook, NumPy, SciPy, and Matplotlib. While many of these libraries aren't necessary for performing the Deep Learning which we'll get to in later lectures, they are incredibly useful for manipulating data on your computer, preparing data for learning, and exploring results.
Jupyter Notebook, previously called "iPython Notebook" prior to version 4.0, is a way of interacting with Python code using a web browser. It is a very useful instructional tool that we will be using for all of our homework assignments. Notebooks have the file extensions "ipynb" which are abbreviations of "iPython Notebook". Some websites such as nbviewer.ipython.org or www.github.com can view .ipynb
files directly as rendered HTML. However, these are not interactive versions of the notebook, meaning, they are not running the python kernel which evaluates/interacts with the code. So the notebook is just a static version of the code contained inside of it.
In order to interact with notebook and start coding, you will need to launch Terminal (for Mac and Linux users). For Windows users, or for anyone having any problems with the Linux/Mac instructions, please follow the next section on Docker Toolbox very closely! If you are not a Windows user, please first try skipping over the next section and use the installation instructions in Jupyter Notebook before trying Docker as this solution will be much faster than running Docker.
Unforunately, at the time of this writing (July 2016), there are no binaries for Tensorflow available for Windows users. The easiest way to get up an running is to use Docker. Docker is a way of managing a "virtual" Linux machine on your computer which will aid the creation a machine capable of running Tensorflow. First, please download and install the Docker Toolbox:
https://www.docker.com/products/docker-toolbox
With this installed, you'll then need to run the "Docker Quickstart Terminal" which will launch a Terminal environment running on a virtual Linux machine on your computer. A virtual machine is basically an emulation of another machine. This is important because we'll use this machine to run Linux and install all of the necessary libraries for running Tensorflow.
Note, if you have trouble launching the Docker Quickstart Terminal because you have "Hyper-V", try one of the following, as suggested by Danilo Gasques:
Once the Docker Quickstart Terminal is launched, run the following command (ignoring the $
sign at the beginning of each line, which just denote that each line is a terminal command that you should type out exactly and then hit ENTER afterwards):
$ cd
$ docker-machine ip
You should see your virtual machine's IP address as a result of the last command. This is the location of your virtual machine. NOTE THIS IP ADDRESS, as we'll need it in a second.
This next command will move to your Windows home directory, then create a new directory called "tensorflow", and then print out what the full path to that directory is. PLEASE NOTE DOWN THIS DIRECTORY. This is where everything will happen, and I'll explain that in a minute.
$ cd
$ mkdir tensorflow
$ echo /$(pwd)/tensorflow
Now run the following command, which will download about ~530 MB containing everything we need to run tensorflow, python, and jupyter notebook (again, ignore the "$" at the beginning of the line only)!
$ docker run -it -p 8888:8888 -p 6006:6006 -v /$(pwd)/tensorflow:/notebooks --name tf pkmital/tf.0.9.0-py.3.4
What this is doing is first creating a directory called tensorflow in the home directory, wherever that may be for your computer. The echo command that we just ran, and I asked you note down, is showing you exactly where that directory is. So on your Windows machine, you will want to put files inside this directory only when coding w/ Tensorflow. We will use Docker to mirror that directory on a virutal machine which has everything necessary for us to code in Python and Tensorflow. Whatever is in that directory will be mirrored on the virtual machine's directory under /notebooks
.
You can also try running the docker run command with any other directory. For instance:
$ docker run -it -p 8888:8888 -p 6006:6006 -v /Users/YOURUSERNAME/Desktop:/notebooks --name tf pkmital/tf.0.9.0-py.3.4
Which would mean that your Desktop is where you can move files around so that on the virtual machine, you can interact with them under the /notebooks
directory.
For OSX users, if you are installing Docker because you had installation problems using Anaconda and pip, you would instead write the following command:
$ docker run -it -p 8888:8888 -p 6006:6006 -v $(pwd)/Desktop/tensorflow:/notebooks --name tf pkmital/tf.0.9.0-py.3.4
This command will download everything you need to run Tensorflow on your virtual machine.
When you want to start this machine, you will launch the Docker Quickstart Terminal and then write:
$ cd
$ docker start -i tf
Notice that the command prompt will now be #
instead of $
. You should have a new folder "tensorflow" inside your Home directory. This directory will be empty to begin with. Please make sure you do everything inside this directory only or else any files you make on your virtual machine WILL BE ERASED once it is shutdown! When you clone the CADL repository, or expand the zip file downloads contents inside this directory via your Windows machine (it will be in your Home directory under a folder "tensorflow"), then you will be able to access it via your Docker instance.
For instance, after running the docker start -i tf
command, try going into the directory /notebooks
:
# cd /notebooks
And then git cloning this repo:
# git clone https://github.com/pkmital/CADL
Now, inside the directory /notebooks/CADL
, you will have this entire repo. Alternatively, you could download a zip file of this repo and use Windows to place it in the directory you noted down before.
Note: Windows/Docker users should scroll past this section to "Windows/Docker". For OSX/Linux users, the easiest way to ensure you have Python 3.4 or higher and Jupter Notebook is to install Anaconda for Python 3.5 located here:
Make sure you restart your Terminal after you install Anaconda as there are some PATH variables that have to be set.
Then run the following:
$ curl https://bootstrap.pypa.io/ez_setup.py -o - | python
If you already have conda, but only have Python 2, you can very easily add a new environment w/ Python 3 and switch back and forth as needed. Or if you do not have Anaconda, but have a system based install, I'd really recommend either using Anaconda or pyenv to help you manage both python installations.
With Anaconda installed, you will have python and the package "ipython[notebook]", along with a ton of other very useful packages such as numpy, matplotlib, scikit-learn, scikit-image, and many others.
With everything installed, restart your Terminal application (on OSX, you can use Spotlight to find the Terminal application), and then navigate to the directory containing the "ipynb", or "iPython Notebook" file, by "cd'ing" (pronounced, see-dee-ing), into that directory. This involves typing the command: "cd some_directory". Once inside the directory of the notebook file, you will then type: "jupyter notebook". If this command does not work, it means you do not have notebook installed! Try installed anaconda as above, restart your Terminal application, or manually install notebook like so (ignore the "$" signs which just denote that this is a Terminal command that you should type out exactly and then hit ENTER!):
$ pip3 install ipython[notebook]
$ jupyter notebook
If you run into issues that say something such as:
[W 20:37:40.543 NotebookApp] Kernel not found: None
Then please try first running:
$ ipython3 kernel install
For users running firewalls, you must make sure you have an exception as per Jupyter Notebooks Firewall Instructions otherwise you may not be able to interact with the notebook. Namely, you will need to allow connections from 127.0.0.1 (localhost) on ports from 49152 to 65535. Once inside your Docker container as outlined above, you can now launch notebook like so:
$ cd /notebooks
$ jupyter notebook &
Note on Virtual versus Windows Directories:
This is tricky to grasp, mostly because I didn't explain it. Docker is "virtual" computer running inside your computer. It has its own filesystem and its own directories. So you can't reference your Windows machine's directories inside this machine. When you first ran docker (e.g. $ docker run -it -p 8888:8888 -p 6006:6006 -v /$(pwd)/tensorflow:/notebooks --name tf pkmital/tf.0.9.0-py.3.4
) it included as part of its command: -v /$(pwd)/tensorflow:/notebooks
. What that was doing is "mirroring" a directory on your Windows machine inside your Virtual machine. So whatever was in your Windows machine under the directory /$(pwd)/tensorflow
would appear in the Virtual machine under /notebooks
. That Windows directory is likely /Users/<YOURUSERNAME>/tensorflow
. So ONLY inside that directory, create it if it doesn't exist, should you put files in order to access it on the Virtual machine.
So let's say your Username was "pkmital". Then your home directory would be /Users/pkmital
, and you would have mirrored /Users/pkmital/tensorflow
on your Windows Machine to the Virtual machine under /notebook
. Now let's say I create a directory /Users/pkmital/tensorflow/images
on my Windows Machine, and then put a bunch of png files in there. I will then see them in my Virtual machine under /notebook/images
. If I put the CADL repository inside /Users/pkmital/tensorflow
, then I should have /Users/pkmital/tensorflow/CADL/session-1/session-1.ipynb
and on the Virtual machine, it will be in /notebooks/CADL/session-1/session-1.ipynb
- From this notebook, running on the virtual machine, accessed with Jupyter Notebook, I would access my images like so:
import os
os.listdir('../../images')
After running "jupyter notebook &", you should see a message similar to:
root@182bd64f27d2:~# jupyter notebook &
[I 21:15:33.647 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[W 21:15:33.712 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using encryption. This is not recommended.
[W 21:15:33.713 NotebookApp] WARNING: The notebook server is listening on all IP addresses and not using authentication. This is highly insecure and not recommended.
[I 21:15:33.720 NotebookApp] Serving notebooks from local directory: /root
[I 21:15:33.721 NotebookApp] 0 active kernels
[I 21:15:33.721 NotebookApp] The IPython Notebook is running at: http://[all ip addresses on your system]:8888/
[I 21:15:33.721 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Don't worry if the IP address or command prompt look different. Note where it says: The IPython Notebook is running at
. If you are running Docker (Windows users), this is where we need that IP address. For OSX/Linux users, we'll use "localhost" so don't worry about this. Now open up Chrome/Safari/Firefox whatever browser you like, and then navigate to:
or for Windows users:
where ADDRESS is the ip address you should have noted down before. For instance, on my machine, I would visit the website:
This will launch the Jupyter Notebook where you will be able to interact with the homework assignments!
Packages are libraries or useful extensions to the standard python libraries. In this course, we'll be using a few including Tensorflow, NumPy, MatPlotLib, SciPy, SciKit-Image, and SciKit-Learn. Windows users will already have these libraries since the Docker container includes these. However, if you needed to, you can install these using "pip", which is the python package manager. OSX/Linux users should follow these steps just to be sure they have the latest versions of these packages. In Python 3.4 and higher, pip
comes with any standard python installation. In order to use pip
, you'll write:
$ pip3 install some_package
To get the necessary libraries:
$ pip3 install "scikit-image>=0.11.3" "numpy>=1.11.0" "matplotlib>=1.5.1" "scikit-learn>=0.17"
This should get you all of the libraries we need for the course, EXCEPT for tensorflow. Tensorflow is a special case, but can be pip
installed in much the same way by pointing pip to the github repo corresponding to your OS like so.
$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp34-cp34m-linux_x86_64.whl
$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.9.0-cp35-cp35m-linux_x86_64.whl
$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/tensorflow-0.9.0-py3-none-any.whl
You can pip install Tensorflow for most OSX/Linux setups including those that are making use of NVIDIA GPUs and CUDA using one the packages listed on this link: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#pip-installation
If you are having trouble with pip installation, try looking here first: Common Installation Problems. Failing that, reach out to us on the forums, or else you may want to instead run a Docker instance as outlined in the Windows instructions above: Setting up a Docker Container.
Note that I have not provided instructions on getting setup w/ CUDA as it is beyond the scope of this course! If you are interested in using GPU acceleration, I highly recommend using Ubuntu Linux and setting up a machine on Nimbix or Amazon EC2 using the instructions here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#optional-install-cuda-gpus-on-linux. If you're using Nimbix, you can skip the install process as there is already a machine pre-installed w/ Tensorflow. Similarly, for Amazon EC2, there are many existing "images" of machines that have Tensorflow already installed.
To confirm it worked, try running:
$ python3 -c 'import tensorflow as tf; print(tf.__version__)'
You should see 0.9.0 be printed.
You may have different versions of Python installed. You can troubleshoot this by looking at the output of:
$ which python3
$ which pip3
$ python3 --version
$ pip3 --version
$ which python
$ which pip
$ python --version
$ pip --version
You may simply need to install tensorflow using pip
instead of pip3
and/or use python
instead of python3
, assuming they point to a version of python which is Python 3 or higher.
You could be running python inside a directory that contains the folder "tensorflow". Try running python inside a different directory.
If you encounter the following when trying to run a TensorFlow program:
ImportError: libcudart.so.7.0: cannot open shared object file: No such file or directory
Make sure you followed the GPU installation instructions. If you built from source, and you left the Cuda or cuDNN version empty, try specifying them explicitly.
TensorFlow pip package depends on protobuf pip package version
3.0.0b2. Protobuf's pip package downloaded from PyPI
(when running pip install protobuf
) is a Python only library, that has
Python implementations of proto serialization/deserialization which can be 10x-50x
slower than the C++ implementation. Protobuf also supports a binary extension
for the Python package that contains fast C++ based proto parsing. This
extension is not available in the standard Python only PIP package. We have
created a custom binary pip package for protobuf that contains the binary
extension. Follow these instructions to install the custom binary protobuf pip
package :
# Ubuntu/Linux 64-bit:
$ pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl
# Mac OS X:
$ pip install --upgrade https://storage.googleapis.com/tensorflow/mac/protobuf-3.0.0b2.post2-cp27-none-any.whl
and for Python 3 :
# Ubuntu/Linux 64-bit:
$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp34-none-linux_x86_64.whl
# Mac OS X:
$ pip3 install --upgrade https://storage.googleapis.com/tensorflow/mac/protobuf-3.0.0b2.post2-cp35-none-any.whl
Install the above package after you have installed TensorFlow via pip, as the
standard pip install tensorflow
would install the python only pip package. The
above pip package will over-write the existing protobuf package.
Note that the binary pip package already has support for protobuf larger than
64MB, that should fix errors such as these :
[libprotobuf ERROR google/protobuf/src/google/protobuf/io/coded_stream.cc:207] A
protocol message was rejected because it was too big (more than 67108864 bytes).
To increase the limit (or to disable these warnings), see
CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.4/dist-packages/tensorflow/core/framework/graph_pb2.py", line 6, in <module>
from google.protobuf import descriptor as _descriptor
ImportError: cannot import name 'descriptor'
If you the above error when upgrading to a newer version of TensorFlow, try uninstalling both TensorFlow and protobuf (if installed) and re-installing TensorFlow (which will also install the correct protobuf dependency).
If, during pip install
, you encounter an error like:
...
IOError: [Errno 2] No such file or directory: '/tmp/pip-o6Tpui-build/setup.py'
Solution: upgrade your version of pip:
pip install --upgrade pip
This may require sudo
, depending on how pip is installed.
If, during pip install from a URL, you encounter an error like:
...
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed
Solution: Download the wheel manually via curl or wget, and pip install locally.