/mst-clustering

Implementation of fuzzy clustering algorithms based on spanning trees.

Primary LanguagePythonMIT LicenseMIT

Minimal Spanning Tree (MST) clustering

Description

This repository provides the Python package for clustering numpy arrays of n-dimensional vectors with methods based on a minimal spanning tree construction, such as the Zahn or Gath-Geva clustering algorithms.

Installation and usage

For installation use the next pip command:

    pip install git+https://github.com/whiteroomlz/mst-clustering.git

The classes ZahnModel and GathGevaModel use the multiprocessing module, so you should create an entry point in your main script if you include some of them into your Pipeline:

import multiprocessing

if __name__ == "__main__":
    multiprocessing.freeze_support()
    ...

Usage example:

import numpy as np
import multiprocessing

from mst_clustering.clustering_models import ZahnModel
from sklearn.datasets import make_blobs
from mst_clustering import Pipeline


if __name__ == "__main__":
    multiprocessing.freeze_support()

    X, y = make_blobs(n_samples=1000, n_features=10, centers=7)
    X /= np.sqrt(np.sum(X ** 2, axis=1))[:, np.newaxis]

    clustering = Pipeline(clustering_models=[
        ZahnModel(3, 1.5, 1e-4, max_num_of_clusters=7),
    ])
    clustering.fit(data=X, workers_count=4)

    labels = clustering.labels
    partition = clustering.partition
    clusters_count = clustering.clusters_count