
Primary LanguageMATLAB


Matlab implementation of Self-Organized map for HGDP SNP genotyping data. For comparison to other dimensionality reduction methods, see L. van der Maaten's Matlab Dimensionality Reduction toolbox, available here.

click me


  1. Preprocessed HGDP data
  • original data is taken from Stanford's Human Genome Diversity Project website
  • 1,043 samples with 660,918 dimensions, reduced to 1,043 dimensions (mapped to sample space). Can be further reduced by PCA.
  • TODO: details of data preprocessing (missing values, two alleles)
  1. Standard batch ordering and convergence phases of the algorithm
  • Default parameters were set following the recommendations from T. Kohonen's MATLAB Implementations and Applications of Self-Organizing map, available here.
  1. Video tracking of weight ordering in the first two principal components
  • During the run of the code, generated is video showing the SOM grid projected to first two principal components
  1. Plots of the mapped data
  • comparison of the training data mapings to the nodes of the grid, distinguishing samples by Regions and Countries/Populations.


  • to reduce issues at the edges of the map, combining SOM updates with k-means updates.
  • adaptively adding nodes to the grid during training
  • mapping to surface instead of node of the grid
  • hexagonal connectivity (?)
  • implement comparison of representations ala Al-Oqaily&Kennedy 2008 paper

###Visualization examples:

  1. PCA click me
  2. tSNE

  1. SOM click me Two of the most common maps, with respect to relative orientation of regions. Left map was a result in about 75% runs, and right map in 20-25%.