A fast Self-Organizing Map Python library implemented in Numba.
This is a fast and simple to use SOM library. It utilizes online training (one data point at the time) rather than batch training. The implemented topologies are a simple 2D lattice or a torus.
To install this package with pip run:
pip install numbasom
To install this package with conda run:
conda install -c mnikola numbasom
To import the library you can safely use:
from numbasom import *
A Self-Organizing Map is often used to show the underlying structure in data. To show how to use the library, we will train it on 200 random 3-dimensional vectors (so we can render them as colors):
import numpy as np
data = np.random.random([200,3])
We initalize a map with 50 rows and 100 columns. The default topology is a 2D lattice. We can also train it on a torus by setting is_torus=True
som = SOM(som_size=(50,100), is_torus=False)
We will adapt the lattice by iterating 10.000 times through our data points. If we set normalize=True
, data will be normalized before training.
lattice = som.train(data, num_iterations=15000)
SOM training took: 0.366633 seconds.
lattice[5,3]
array([0.92550425, 0.20740594, 0.92610555])
lattice[1::6,1]
array([[0.7473038 , 0.09876245, 0.93051731],
[0.93542156, 0.18717452, 0.87611239],
[0.9486485 , 0.11080808, 0.57557379],
[0.87873391, 0.13527156, 0.368202 ],
[0.78669284, 0.11830203, 0.2634972 ],
[0.68213238, 0.06408478, 0.33050376],
[0.54769163, 0.05391318, 0.31153485],
[0.63722088, 0.12484291, 0.0684501 ],
[0.64172725, 0.01517416, 0.09549566]])
The shape of the lattice should be (50, 100, 3)
lattice.shape
(50, 100, 3)
Since our lattice is made of 3-dimensional vectors, we can represent it as a lattice of colors.
import matplotlib.pyplot as plt
plt.imshow(lattice)
plt.show()
Since the most of the data will not be 3-dimensional, we can use the u_matrix
(unified distance matrix by Alfred Ultsch) to visualise the map and the clusters emerging on it.
um = u_matrix(lattice)
Each cell of the lattice is just a single value, thus the shape is:
um.shape
(50, 100)
The library contains a function plot_u_matrix
that can help visualise it.
plot_u_matrix(um, fig_size=(6.2,6.2))
To project data on the lattice, use project_on_lattice
function.
Let's project a couple of predefined color on the trained lattice and see in which cells they will end up:
colors = np.array([[1.,0.,0.],[0.,1.,0.],[0.,0.,1.],[1.,1.,0.],[0.,1.,1.],[1.,0.,1.],[0.,0.,0.],[1.,1.,1.]])
color_labels = ['red', 'green', 'blue', 'yellow', 'cyan', 'purple','black', 'white']
projection = project_on_lattice(colors, lattice, additional_list=color_labels)
for p in projection:
if projection[p]:
print (p, projection[p][0])
Projecting on SOM took: 0.158945 seconds.
(0, 85) blue
(2, 39) white
(5, 1) purple
(10, 60) cyan
(41, 59) green
(49, 12) red
(49, 40) yellow
(49, 96) black
To find every cell's closes vector in the provided data, use lattice_closest_vectors
function.
We can again use the colors example:
closest = lattice_closest_vectors(colors, lattice, additional_list=color_labels)
Finding closest data points took: 0.003056 seconds.
We can ask now to which value in color_labels
are out lattice cells closest to:
closest[(1,1)]
['purple']
closest[(40,80)]
['green']
We can find the closest vectors without supplying an additional list. Then we get the association between the lattice and the data vectors that we can display as colors.
closest_vec = lattice_closest_vectors(colors, lattice)
Finding closest data points took: 0.003491 seconds.
We take the values of the closest_vec
vector and reshape it into a numpy vector values
.
values = np.array(list(closest_vec.values())).reshape(50,100,-1)
We can now visualise the projection of our 8 hard-coded colors onto the lattice:
plt.imshow(values)
plt.show()
We can use the function lattice_activations
:
activations = lattice_activations(colors, lattice)
Computing SOM activations took: 0.000484 seconds.
Now we can show how the vector blue: [0.,0.,1.]
activates the lattice:
plt.imshow(activations[2])
plt.show()
If we wish to scale the higher values up, and scale down the lower values, we can use the argument exponent
when computing the activations:
activations = lattice_activations(colors, lattice, exponent=8)
Computing SOM activations took: 0.000838 seconds.
plt.imshow(activations[2])
plt.show()