Implementation of the 2D self-organizing map, with support for NumPy arrays and Pandas DataFrames. Most features were implemented using NumPy, with Scikit-learn for standardization and PCA operations.
- Stepwise and batch training
- Random weight initialization
- Random sampling weight initialization
- Linear weight initialization (with PCA)
- Automatic selection of map size ratio (with PCA)
- Support for cyclic arrays, for toroidal or spherical maps
- Gaussian and Bubble neighborhood functions
- Support for custom decay functions
- Support for visualization (U-matrix, activation matrix)
- Support for supervised learning (label map)
- Support for NumPy arrays, Pandas DataFrames and regular lists of values
In the following code excerpt (also available in test.py) is an example of instantiation and training of a SOM with the Iris dataset:
# Import python_som
import python_som
# Import NumPy and Pandas for storing data
import numpy as np
import pandas as pd
# Import libraries for plotting results
import matplotlib.pyplot as plt
import seaborn as sns
# Load Iris dataset and columns of features and labels
iris = sns.load_dataset('iris')
target = iris.iloc[:, -1].to_numpy()
iris = iris.iloc[:, :-1].to_numpy()
# Transform labels into numeric codes for plotting
tg = np.zeros(len(target), dtype=int)
tg[target == 'setosa'] = 0
tg[target == 'versicolor'] = 1
tg[target == 'virginica'] = 2
# Instantiate SOM from python_som
# Selecting shape automatically (providing dataset for constructor)
# Using default decay and distance functions
# Using gaussian neighborhood function
# Using cyclic arrays in the vertical and horizontal directions
som = python_som.SOM(x=20, y=None, input_len=iris.shape[1], learning_rate=0.5, neighborhood_radius=1.0,
neighborhood_function='gaussian', cyclic_x=True, cyclic_y=True, data=iris)
# Initialize weights of the SOM with linear initialization
som.weight_initialization(mode='linear', data=iris)
# Training SOM with default number of iterations
# Using batch learning process
som.train(data=iris, n_iteration=len(iris), mode='batch', verbose=True)
# Calculating distance matrix for plotting
umatrix = som.distance_matrix().T
# Plotting U-matrix with seaborn/matplotlib
plt.figure(figsize=som.get_shape())
plt.pcolor(umatrix, cmap='bone_r')
markers = ['o', 's', 'D']
colors = ['C0', 'C1', 'C2']
for cnt, xx in enumerate(iris):
w = som.winner(xx) # getting the winner
plt.plot(w[0] + .5, w[1] + .5, markers[tg[cnt]], markerfacecolor='None',
markeredgecolor=colors[tg[cnt]], markersize=12, markeredgewidth=2)
plt.axis([0, som.get_shape()[0], 0, som.get_shape()[1]])
plt.show()
The following image is generated from the previous test code, with the U-matrix of the trained SOM, and the distribution of the instances from the Iris dataset. In this graph, the instances are mapped to the self-organizing map, with color codes for each different label:
- Setosa: blue circle
- Versicolor: orange square
- Virginica: green diamond
The following are lists of public methods and functions currently available in the SOM class. The full documentation of each method can be found in the source code:
- _asymptotic_decay
- _linear_decay
- _exponential_decay
- _inverse_decay
- _euclidean_distance
- SOM
- SOM.get_shape
- SOM.get_weights
- SOM.set_learning_rate
- SOM.set_neighborhood_radius
- SOM.activate
- SOM.winner
- SOM.quantization
- SOM.quantization_error
- SOM.distance_matrix
- SOM.activation_matrix
- SOM.winner_map
- SOM.label_map
- SOM.train
- SOM.weight_initialization
This implemetation was based on the following paper, by Professor Teuvo Kohonen:
Teuvo Kohonen, Essentials of the self-organizing map, Neural Networks, Volume 37, 2013, Pages 52-65, ISSN 0893-6080, https://doi.org/10.1016/j.neunet.2012.09.018.