SOMBER

somber (Somber Organizes Maps By Enabling Recurrence) is a collection of numpy/python implementations of various kinds of Self-Organizing Maps (SOMS), with a focus on SOMs for sequence data.

To the best of my knowledge, the sequential SOM algorithms implemented in this package haven't been open-sourced yet. If you do find examples, please let me know, so I can compare and link to them.

The package currently contains implementations of:

Regular Som (SOM) (Kohonen, various publications)

Recursive Som (RecSOM) (Voegtlin, 2002)

Neural Gas (NG) (Martinetz & Schulten, 1991)

Recursive Neural Gas (Voegtlin, 2002)

Because these various sequential SOMs rely on internal dynamics for convergence, i.e. they do not fixate on some external label like a regular Recurrent Neural Network, processing in a sequential SOM is currently strictly online. This means that every example is processed separately, and weight updates happen after every example. Research into the development of batching and/or multi-threading is currently underway.

If you need a fast regular SOM, check out SOMPY, which is a direct port of the MATLAB Som toolbox.

Usage

Care has been taken to make SOMBER easy to use, and function like a drop-in replacement for sklearn-like systems. The non-recurrent SOMs take as input [M * N] arrays, where M is the number of samples and N is the number of features. The recurrent SOMs take as input [M * S * N] arrays, where M is the number of sequences, S is the number of items per sequence, and N is the number of features.

Examples

Colors

Color clustering is a kind of Hello, World for Soms, because it nicely demonstrates how SOMs create a continuous mapping. The color dataset comes from this nice blog

import numpy as np

from somber import Som

X = np.array([[0., 0., 0.],
              [0., 0., 1.],
              [0., 0., 0.5],
              [0.125, 0.529, 1.0],
              [0.33, 0.4, 0.67],
              [0.6, 0.5, 1.0],
              [0., 1., 0.],
              [1., 0., 0.],
              [0., 1., 1.],
              [1., 0., 1.],
              [1., 1., 0.],
              [1., 1., 1.],
              [.33, .33, .33],
              [.5, .5, .5],
              [.66, .66, .66]])

color_names = ['black', 'blue', 'darkblue', 'skyblue',
               'greyblue', 'lilac', 'green', 'red',
               'cyan', 'violet', 'yellow', 'white',
               'darkgrey', 'mediumgrey', 'lightgrey']

# initialize
s = Som((10, 10), data_dimensionality=3, learning_rate=0.3)

# train
# 10 updates with 10 epochs = 100 updates to the parameters.
s.fit(X, num_epochs=10, updates_epoch=10)

# predict: get the index of each best matching unit.
predictions = s.predict(X)
# quantization error: how well do the best matching units fit?
quantization_error = s.quantization_error(X)
# inversion: associate each node with the exemplar that fits best.
inverted = s.invert_projection(X, color_names)
# Mapping: get weights, mapped to the grid points of the SOM
mapped = s.map_weights()

import matplotlib.pyplot as plt

plt.imshow(mapped)

Sequences

In this example, we will show that the RecursiveSOM is able to memorize short sequences which are generated by a markov chain. We will also demonstrate that the RecursiveSOM can generate sequences which are consistent with the sequences on which it has been trained.

import numpy as np

from somber import RecursiveSom
from string import ascii_lowercase

# Dumb sequence generator.
def seq_gen(num_to_gen, probas):

    symbols = ascii_lowercase[:probas.shape[0]]
    identities = np.eye(probas.shape[0])
    seq = []
    ids = []
    r = 0
    choices = np.arange(probas.shape[0])
    for x in range(num_to_gen):
        r = np.random.choice(choices, p=probas[r])
        ids.append(symbols[r])
        seq.append(identities[r])

    return np.array(seq)[None, :, :], ids

# Transfer probabilities.
# after an A, we have a 50% chance of B or C
# after B, we have a 100% chance of A
# after C, we have a 50% chance of B or C
# therefore, we will never expect sequential A or B, but we do expect
# sequential C.
probas = np.array(((0.0, 0.5, 0.5),
                   (1.0, 0.0, 0.0),
                   (0.0, 0.5, 0.5)))

X, ids = seq_gen(10000, probas)

# initialize
# alpha = contribution of non-recurrent part to the activation.
# beta = contribution of recurrent part to activation.
# higher alpha to beta ratio
s = RecursiveSom((10, 10),
                 data_dimensionality=3,
                 learning_rate=0.3,
                 alpha=1.2,
                 beta=.9)

# train
# show a progressbar.
s.fit(X, num_epochs=100, updates_epoch=10, show_progressbar=True)

# predict: get the index of each best matching unit.
predictions = s.predict(X)
# quantization error: how well do the best matching units fit?
quantization_error = s.quantization_error(X)

# inversion: associate each node with the exemplar that fits best.
inverted = s.invert_projection(X, ids)

# find which sequences are mapped to which neuron.
receptive_field = s.receptive_field(X, ids)

# generate some data by starting from some position.
# the position can be anything, but must have a dimensionality
# equal to the number of weights.
starting_pos = np.ones(s.num_neurons)
generated_indices = s.generate(50, starting_pos)

# turn the generated indices into a sequence of symbols.
generated_seq = inverted[generated_indices]

TODO

See issues for TODOs/enhancements. If you use SOMBER, feel free to send me suggestions!

Contributors

Stéphan Tulkens

LICENSE

MIT

liehtman/somber