/geneMap

Primary LanguageMATLAB

geneMap

Matlab implementation of Self-Organized map for HGDP SNP genotyping data. For comparison to other dimensionality reduction methods, see L. van der Maaten's Matlab Dimensionality Reduction toolbox, available here.

click me

###Including

  1. Preprocessed HGDP data
  • original data is taken from Stanford's Human Genome Diversity Project website
  • 1,043 samples with 660,918 dimensions, reduced to 1,043 dimensions (mapped to sample space). Can be further reduced by PCA.
  • TODO: details of data preprocessing (missing values, two alleles)
  1. Standard batch ordering and convergence phases of the algorithm
  • Default parameters were set following the recommendations from T. Kohonen's MATLAB Implementations and Applications of Self-Organizing map, available here.
  1. Video tracking of weight ordering in the first two principal components
  • During the run of the code, generated is video showing the SOM grid projected to first two principal components
  1. Plots of the mapped data
  • comparison of the training data mapings to the nodes of the grid, distinguishing samples by Regions and Countries/Populations.

###TODO

  • to reduce issues at the edges of the map, combining SOM updates with k-means updates.
  • adaptively adding nodes to the grid during training
  • mapping to surface instead of node of the grid
  • hexagonal connectivity (?)
  • implement comparison of representations ala Al-Oqaily&Kennedy 2008 paper

###Visualization examples:

  1. PCA click me
  2. tSNE

  1. SOM click me Two of the most common maps, with respect to relative orientation of regions. Left map was a result in about 75% runs, and right map in 20-25%.