This is a multicore modification of Barnes-Hut t-SNE by L. Van der Maaten with python and Torch CFFI-based wrappers. This code also works faster than sklearn.TSNE on 1 core.
Barnes-Hut t-SNE is done in two steps.
-
First step: an efficient data structure for nearest neighbours search is built and used to compute probabilities. This can be done in parallel for each point in the dataset, this is why we can expect a good speed-up by using more cores.
-
Second step: the embedding is optimized using gradient descent. This part is essentially consecutive so we can only optimize within iteration. In fact some parts can be parallelized effectively, but not all of them a parallelized for now. That is why second step speed-up will not be that significant as first step sepeed-up but there is still room for improvement.
So when can you benefit from parallelization? It is almost true, that the second step computation time is constant of D
and depends mostly on N
. The first part's time depends on D
a lot, so for small D
time(Step 1) << time(Step 2)
, for large D
time(Step 1) >> time(Step 2)
. As we are only good at parallelizing step 1 we will benefit most when D
is large enough (MNIST's D = 784
is large, D = 10
even for N=1000000
is not so much). I wrote multicore modification originally for Springleaf competition, where my data table was about 300000 x 3000
and only several days left till the end of the competition so any speed-up was handy.
Interestingly, that this code beats other implementations. We compare to sklearn
(Barnes-Hut of course), L. Van der Maaten's bhtsne, py_bh_tsne repo (cython wrapper for bhtsne with QuadTree). perplexity = 30, theta=0.5
for every run. In fact py_bh_tsne repo works at the same speed as this code when using more optimization flags for compiler.
This is a benchmark for 70000x784
MNIST data:
Method | Step 1 (sec) | Step 2 (sec) |
---|---|---|
MulticoreTSNE(n_jobs=1) | 912 | 350 |
bhtsne | 4257 | 1233 |
py_bh_tsne | 1232 | 367 |
sklearn(0.18) | ~5400 | ~20920 |
I did my best to find what is wrong with sklearn numbers, but it is the best benchmark I could do (you can find test script in python/tests
folder).
This table shows a relative to 1 core speed-up when using n
cores.
n_jobs | Step 1 | Step 2 |
---|---|---|
1 | 1x | 1x |
2 | 1.54x | 1.05x |
4 | 2.6x | 1.2x |
8 | 5.6x | 1.65x |
Python and torch wrappers are available.
Make sure cmake
is installed on your system.
To install the package please do:
git clone https://github.com/DmitryUlyanov/Multicore-TSNE.git
cd Multicore-TSNE/
pip install --no-cache-dir .
It's important that you add --no-cache-dir
otherwise pip won't copy
the .so
file which is needed at runtime.
For installation on MacOS, before running pip install --no-cache-dir .
, you need to
- install gcc with `brew install gcc --without-multilib
- change line 9 of CMakeLists.txt to
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS} -O3 -fPIC -ffast-math -funroll-loops -lstdc++")
- run
export CC="/usr/local/Cellar/gcc/X.x.x/bin/gcc-X"; export CXX="/usr/local/Cellar/gcc/X.x.x/bin/gcc-X"
, whereX
andx
refers to the version ofgcc
; in my case, e.g., the path reads/usr/local/Cellar/gcc/6.3.0_1/bin/gcc-6
Tested with both Python 2.7 and 3.6 (conda) and Ubuntu 14.04.
You can use it as a drop-in replacement for sklearn.manifold.TSNE.
from MulticoreTSNE import MulticoreTSNE as TSNE
tsne = TSNE(n_jobs=4)
Y = tsne.fit_transform(X)
Please refer to sklearn TSNE manual for parameters explanation.
Only double arrays are supported for now. For this implementation n_components
is fixed to 2
, which is the most common case (use Barnes-Hut t-SNE or sklearn otherwise). Also note that some of the parameters will be ignored for sklearn compatibility. Only these parameters are used (and they are the most important ones):
- perplexity
- n_iter
- angle
You can test it on MNIST dataset with the following command:
python python/tests/test.py <n_jobs>
To make the computation log visible in jupyter please install wurlitzer
(pip install wurlitzer
) and execute this line in any cell beforehand:
%load_ext wurlitzer
Memory leakages are possible if you interrupt the process. Should be OK if you let it run until the end.
To install execute the following command from repository folder:
luarocks make torch/tsne-1.0-0.rockspec
or
luarocks install https://raw.githubusercontent.com/DmitryUlyanov/Multicore-TSNE/master/torch/tsne-1.0-0.rockspec
You can run t-SNE like that:
tsne = require 'tsne'
Y = tsne(X, n_components, perplexity, n_iter, angle, n_jobs)
torch.DoubleTensor
type only supported for now.
Inherited from original repo's license.
- Allow other types than double
- Improve step 2 performance (possible)
Please cite this repository if it was useful for your research:
@misc{Ulyanov2016,
author = {Ulyanov, Dmitry},
title = {Muticore-TSNE},
year = {2016},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/DmitryUlyanov/Muticore-TSNE}},
}
Of course, do not forget to cite L. Van der Maaten's paper