/t-SNE

Implementation of t-SNE and Barnes-Hut-SNE algorithm. Comparison of algorithm implementation with sklearn library implementation on sample databases.

Primary LanguagePython

t-Distributed Stochastic Neighbor Embedding (t-SNE algorithm)

Table of contents

  1. t-SNE algorithm
  2. Barnes-Hut-SNE algorithm
  3. t-SNE and Barnes-Hut-SNE project structure
  4. Databases and results
  5. References

t-SNE algorithm

t-SNE algorithm represents non linear dimensionality reduction method which maps high-dimensional data X = {x1, x2, ..., xn} in low dimensional data Y = {y1, y2, ..., yn} such that we preserve as much information as possible. This is done by making pairwise similarities pij between high dimensional points X as similar as possible to pairwise similarities qij between low dimensional points Y.
Pairwise similarities pij are defined with Gaussian distributions:


Where each of Gaussian variances centered in data point xi we can obtain by binary search of sigma_i for predefined perplexity (neighborhood points):


Pairwise similarities qij are defined with Student t-distribution:


Student t-distribution is used to overcome "Crowding Problem". Similarities between same points are set to zero so pij and qij are set to 0.
For mapping to be successful we want that these high dimensional distributions pij are as same as possible to low dimensional distributions qij. Hance, Kullback-Leibler divergence is used as criterium function which is minimized:

KL divergence is minimized using gradient decent algorithm, with adaptive learning rate and momentum.

Pseudo code od the algorithm can be found below:

Barnes-Hut-SNE algorithm

Time and memory complexity of t-SNE algorithm is O(N^2), where N is number of data points, which is not appropriate for datasets with more than few thousand points. Barnes-Hut-SNE is approximation of t-SNE algorithms which requires O(NlogN) time and memory complexity and can be used for large datasets. Barnes-Hut-SNE uses 2 approximations:
 1. Approximation of input data similarity pij
 2. Approximation of t-SNE gradient calculation

First approximation is done using k-dimensional (k-d) tree for finding the first p=3*perplexity neighbors of each input data point. Complexity of this approach is O(NlogN). Example k-d tree constructed on synthetic data is presented in picture below:


t-SNE KL gradient is defined with formula:


KL Gradient can be represented as follows:


F_attr can be calculated in O(pN) time complexity. F_rep requires O(N^2) time complexity which we can reduce to O(NlogN) using Barnes-Hut approximation. Barnes-Hut-SNE constructs quad tree on output (low-dimensional) data Y and in each iteration of calculation of F_rep it decides if current node can be used as summary of contribution to F_rep for all the data inside that node. Example quad tree constructed on synthetic data is presented in picture below:


t-SNE and Barnes-Hut-SNE project structure

Project structure can be found in picture below. Module tsne_algorithm is the core module and has t-SNE and Barnes-Hut-SNE implemented using only numpy. Implementation of tree structures (k-d tree and quad tree) can be found in module tsne_algorithm/trees. Module tsne_test_script has scripts for testing t-SNE implementation and it is used for results visualisation. Folder dataset contains data used for testing t-SNE implementation. Test data can be obtaind just by unzipping 7z files.


Databases and results

Two databases are use as test of tsne and Barnes-Hut-SNE implementation:
 1. MNIST - 6000 images of digits 0-9 with resolution 28x28. Sample images can be found below


 2. Olivetti faces - 400 images of 40 different people with resolution 64x64. Sample images can be found below.


Results on MNIST dataset on 4000 samples of t-SNE "exact" method


Results on MNIST dataset on 1000 samples of Barnes-Hut-SNE method


Results on Olivetti faces dataset of t-SNE "exact" method


Results on Olivetti faces dataset of Barnes-Hut-SNE method


References

Implementation of t-SNE algorithm references
t-SNE algorithm:
https://distill.pub/2016/misread-tsne/
https://towardsdatascience.com/t-sne-clearly-explained-d84c537f53a
https://jeremy9959.net/Blog/tsne_annotated-fixed/
http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
Adaptive learning rate update:
https://web.cs.umass.edu/publication/docs/1987/UM-CS-1987-117.pdf
K-d tree:
https://upcommons.upc.edu/bitstream/handle/2117/76382/hpgm_15_1.pdf
https://arxiv.org/pdf/cs/9901013.pdf
https://en.wikipedia.org/wiki/K-d_tree
Neighbors Heap:
https://www.hackerearth.com/practice/notes/heaps-and-priority-queues/
Quad tree
https://en.wikipedia.org/wiki/Quadtree
Barnes-Hut approximation in t-SNE:
https://arxiv.org/pdf/1301.3342.pdf
https://jheer.github.io/barnes-hut/