/t-SNE-Analysis-all-intermediate-embeddings

fork of the repo for extra data manipulation of t-sne optimization data overtime

Primary LanguageC++GNU Lesser General Public License v3.0LGPL-3.0

t-SNE & HSNE Analysis Actions Status

t-SNE and HSNE analysis plugins for ManiVault.

git clone git@github.com:ManiVaultStudio/t-SNE-Analysis.git

This project builds two plugins which wrap the functionality of the HDILib:

t-SNE and HSNE embeddings Left: t-SNE embedding of 10k MNIST test data. Center: (top) HSNE top scale embedding of the same data, (bottom) two refinements of overlapping top-level selections. Right: HSNE setting panels.

Compilation with a locally built HDILib

By default, during the cmake configuration, a pre-built version of HDILib will be downloaded. You might want to compile the HDILib locally instead. To use this locally compiled library, set the cmake variable USE_ARTIFACTORY_LIBS to OFF and provide HDILIB_ROOT, e.g. PATH_TO_HDILib_install\lib\cmake\HDILib for cmake to find the HDILib binaries.

On Windows, in order to manage the HDILib dependency flann, we recommend using vcpkg. Set up cmake to find pacakges with vcpkg by providing the variables CMAKE_TOOLCHAIN_FILE (PATH_TO/vcpkg/scripts/buildsystems/vcpkg.cmake) and VCPKG_TARGET_TRIPLET x64-windows-static.

Tested with Ubuntu 22.10, gcc 12.2.0:

# In your local t-SNE analysis plugin folder
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DUSE_ARTIFACTORY_LIBS=OFF -DHDILIB_ROOT=/PATH/TO/YOUR/LOCALHDILIB -DMV_INSTALL_DIR=/PATH/TO/MANIVAULT
cmake --build build --config Release --target install

Notes on settings

  • Exaggeration factor: Defaults to 4 + number of points / 60'000
  • Initialization:
    • Defaults to random. Optional: Use another data set as the initial embedding coordinates, e.g. the first two PCA components.
    • Defaults to rescaling the initial coordinates such that the first embedding dimension has a standard deviation of 0.0001. If turned off, the random initialization will uniformly sample coordinates from a circle with radius 1.
    • See e.g. The art of using t-SNE for single-cell transcriptomics for more details on recommended t-SNE settings
  • Gradient Descent:
    • GPU-based implementation (default) requires OpenGL 3.3 and benefits from compute shaders (introduced in OpenGL 4.4 and not available on Apple devices)
    • CPU-based implementation of Barnes-Hut t-SNE automatically sets θ to min(0.5, max(0.0, (numPoints - 1000.0) * 0.00005))
    • Changes to gradient descent parameters are not taken into account when "continuing" the gradient descent, but when "reinitializing" they are
  • kNN (specify search structure construction and query characteristics):
    • (Annoy) Trees & Checks: correspond to n_trees and search_k, see their docs
    • (HNSW): M & ef: are detailed in the respective docs
  • HSNE:
    • The number of scales includes the data scale, i.e., a setting of 2 scales indicates one abstraction scale above the data scale. Specifying 1 scale will not compute any abstraction level.