Table of Contents
Here is the hardware configuration of the main box we used for this competition:
# CPU
6 core / 12 virtual with hyperthreding - Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz base clock + turbo+
# GPU
NVIDIA GM204 [GeForce GTX 980]
NVIDIA GM204 [GeForce GTX 980]
# Memory
DDR3 DIMM Mem: 32G Swap: 225G
# Hard Drive Space
root: / 100GB
home: /home 1.8TB
# OS
Ubuntu 14.04 LTS
The follwing setup assumes you have Ubuntu 14.04 LTS
Installing python & sklearn:
-
Install build-essential, which is a package in Ubuntu which includes gcc and other build tools
-
$ sudo apt-get install python-dev
-
$ pip install --user -e scikit-learn
-
Installing CUDA:
-
Install Cuda 7.0 (preferably 7.5 if you have GTX 980) by downloading the deb/run file from NVIDIA's website. Do not install it from the ubuntu ppa repos. The latest version there is always lagging behind.
-
Mostly follow: The cuda getting started guide
- If for some reason (such as interaction with nouveau driver, etc ...)
the driver doesn't load (which you can check by running
$ cat /proc/driver/nvidia/version
, run$ sudo ldconfig
- If for some reason (such as interaction with nouveau driver, etc ...)
the driver doesn't load (which you can check by running
-
Run
$ nvidia-smi
to check for proper installation -
Preferably run
$ cuda-install-samples-6.5.sh ~
and run some of the examples.
Installing Theano:
-
Install Theano dependencies first
- numpy
- scipy
- pip
- Just use the following command:
$ suodo apt-get install numpy scipy
-
$ git clone git://github.com/Theano/Theano.git
-
$ pip install --user -e Theano
- This installs Theano from the git repository you just cloned. The
--user
is needed if you don't want to install it system wide. You should do this so you don't pollute the system wide python packages. Virtualenv is also an option.
- This installs Theano from the git repository you just cloned. The
-
Follow The using_gpu tutorial to test Theano with the GPU
-
Place the following in ~/.theanorc
[global] device = gpux # where x can be 0,1,2, ... GPU ID you want to make the default floatX = float32
Installing Lasagne and Nolearn:
-
After making sure that Theano setup works, you can always use the following command to install the latest Theano, Lasagne, & Nolearn packages:
-
pip install --no-deps git+git://github.com/Theano/Theano
-
pip install --no-deps git+git://github.com/benanne/Lasagne
-
pip install --no-deps git+git://github.com/dnouri/nolearn
-
The file submission.py is designed to be used to recreate the ensemble that was used to generate our third place submission. To generate our result, folow these four steps:
-
Install all depenencies as descriped below.
-
Edit SETTINGS.json:
-
Set
train_dir
andtest_dir
to point to the correct training and testing data. -
Set
dump_dir
andsubmission_dir
to point to directories in which to write the trained models and the csv files respectively. -
Optionally set
submission_workers
andtheano_flags
.submission_workers
selects how many worker processes submission.py should spawn when training.theano_flags
is a list of flags to pass to the theano. The primary use for this is to spread training across multiple GPUs. For instance `"theano_flags" : ["device=gpu0", "device=gpu1"] will run half the workers on gpu0 and half on gpu1.
-
-
Run
python submission.py run
-- this will run all the models specified in final_nets.yml, write the train models todump_dir
and the csv output files tosubmission_dir
. If a dump file with the corresponding name already exists in dump_dir, then the training step will be skipped. If a csv file with the corresponding name already exists in submission_dir then the submission creation step will also be skipped. If all nets are being rerun from scratch, this may take several days, depending on your hardware. -
Run
python submission.py ensemble
. This will comput a weighted average of all of the nets created in step 3 using the weighting scheme we used in our best submission. The results will be written tosubmission_dir
with the name ensemble.csv.
In addition to the above procedure for generating ensemble.csv, submission.py has a few other potentially useful commands.
-
python submission.py dry
-- performs a dry run; instantiating all nets but not training -
python submission.py
-- show a numbered list of the nets in final_nets.yml -
python submission.py run <N>
-- train and dump only net number N, where N is the number given py the previous command.
Note that python submission.py run
will not overwrite already existing
files, so if you wish to replace files you will have to remove them manually
before starting the run.
To make a prediction on a new test set, one would simply make a new test directory containing the new data and point SETTINGS.json to that directory.