
t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction algorithm which comes from Stochastic Neighbor Embedding (SNE). It can capture local and global structure from high dimensional data into low dimensional data

  1. Convert pairwise distances of high dimensional data into conditional probabilities(similaritiy) and assume each datapoint will pick neighbor according to a Gaussain distribution,

  1. Each datapoint of high dimensional data has its own particular variance which can reflect how dense or sparse different region is. A variance can induce a probability distribution . For Selecting proper variance for each i, user can set a fixed perplexity and it will use binary search to find which can let to be a distribution with the fixed perplexity,

  1. Covert low dimensional data into conditional probabilities(similaritiy) with the same way but set the variance to ,

  1. Use gradient discnet to minimize Kullback-Leibler divergence(KL-divergence) of these two distribution,


t-SNE use symmetrized cost function of SNE and use Student-t distribution to compute similarity of low dimensional data.

  1. Symmetrized cost function

  1. Student-t distribution

  1. KL-divergence

  1. Gradient


  1. Train with momentum: 0.9
  2. Learing rate: 15
  3. Iteration: 500
  4. Data: MNIST
  5. Result
    iter 0 iter 100 iter 200
    iter 300 iter 400 iter 499


