SLIDE
The SLIDE package contains the source code for reproducing the main experiments in this paper.
Dataset
The Datasets can be downloaded in Amazon-670K. Note that the data is sorted by labels so please shuffle at least the validation/testing data.
TensorFlow Baselines
We suggest directly get TensorFlow docker image to install TensorFlow-GPU. For TensorFlow-CPU compiled with AVX2, we recommend using this precompiled build.
Also there is a TensorFlow docker image specifically built for CPUs with AVX-512 instructions, to get it use:
docker pull clearlinux/stacks-dlrs_2-mkl
config.py
controls the parameters of TensorFlow training like learning rate
. example_full_softmax.py, example_sampled_softmax.py
are example files for Amazon-670K
dataset with full softmax and sampled softmax respectively.
Run
python python_examples/example_full_softmax.py
python python_examples/example_sampled_softmax.py
Running SLIDE
Dependencies
- CMake v3.0 and above
- C++11 Compliant compiler
- Linux: Ubuntu 16.04 and newer
- Transparent Huge Pages must be enabled.
- SLIDE requires approximately 900 2MB pages, and 10 1GB pages: (Instructions)
Notes:
-
For simplicity, please refer to the our Docker image with all environments installed. To replicate the experiment without setting Hugepages, please download Amazon-670K in path
/home/code/HashingDeepLearning/dataset/Amazon
-
Also, note that only Skylake or newer architectures support Hugepages. For older Haswell processors, we need to remove the flag
-mavx512f
from theOPT_FLAGS
line in Makefile. You can also revert to the commit2d10d46b5f6f1eda5d19f27038a596446fc17cee
to ignore the HugePages optimization and still use SLIDE (which could lead to a 30% slower performance). -
This version builds all dependencies (which currently are ZLIB and CNPY).
Commands
Change the paths in ./SLIDE/Config_amz.csv
appropriately.
git clone https://github.com/sarthakpati/HashingDeepLearning.git
cd HashingDeepLearning
mkdir bin
cd bin
cmake ..
make
./runme ../SLIDE/Config_amz.csv