Matlab implementation of Self-Organized map for HGDP SNP genotyping data. For comparison to other dimensionality reduction methods, see L. van der Maaten's Matlab Dimensionality Reduction toolbox, available here.
###Including
- Preprocessed HGDP data
- original data is taken from Stanford's Human Genome Diversity Project website
- 1,043 samples with 660,918 dimensions, reduced to 1,043 dimensions (mapped to sample space). Can be further reduced by PCA.
- TODO: details of data preprocessing (missing values, two alleles)
- Standard batch ordering and convergence phases of the algorithm
- Default parameters were set following the recommendations from T. Kohonen's MATLAB Implementations and Applications of Self-Organizing map, available here.
- Video tracking of weight ordering in the first two principal components
- During the run of the code, generated is video showing the SOM grid projected to first two principal components
- Plots of the mapped data
- comparison of the training data mapings to the nodes of the grid, distinguishing samples by Regions and Countries/Populations.
###TODO
- to reduce issues at the edges of the map, combining SOM updates with k-means updates.
- adaptively adding nodes to the grid during training
- mapping to surface instead of node of the grid
- hexagonal connectivity (?)
- implement comparison of representations ala Al-Oqaily&Kennedy 2008 paper
###Visualization examples: