t-Distributed Stochastic Neighbor Embedding (t-SNE) is a dimensional reduction algorithm which comes from Stochastic Neighbor Embedding (SNE). It can capture local and global structure from high dimensional data into low dimensional data
- Python
- Pycharm
- Convert pairwise distances of high dimensional data into conditional probabilities(similaritiy) and assume each datapoint will pick neighbor according to a Gaussain distribution,
- Each datapoint of high dimensional data has its own particular variance which can reflect how dense or sparse different region is. A variance can induce a probability distribution . For Selecting proper variance for each i, user can set a fixed perplexity and it will use binary search to find which can let to be a distribution with the fixed perplexity,
- Covert low dimensional data into conditional probabilities(similaritiy) with the same way but set the variance to ,
- Use gradient discnet to minimize Kullback-Leibler divergence(KL-divergence) of these two distribution,
t-SNE use symmetrized cost function of SNE and use Student-t distribution to compute similarity of low dimensional data.
- Symmetrized cost function
- Student-t distribution
- KL-divergence
- Gradient