OWCK: A Python repository from diversoft

The Optimal Weighted Cluster Kriging/Gaussian Process class

This class inherited from GaussianProcess class in sklearn library Most of the parameters are contained in sklearn.gaussian_process.

Please check the docstring of Gaussian Process parameters in sklearn. Only newly introduced parameters are documented below.

Install Instructions

Just run the install.py or install directly with pip. You need OpenMPI installed to use the parralel options.

Pip:

pip install OWCK

Parameters

n_cluster : int, optional: The number of clusters, determines the number of the Gaussian Process model to build. It is the speed-up factor in OWCK.
cluster_method : string, optional: The clustering algorithm used to partition the data set. Built-in clustering algorithm are:

'k-mean', 'GMM', 'fuzzy-c-mean', 'random', 'tree' Note that GMM, fuzzy-c-mean are fuzzy clustering algorithms With these algorithms you can set the overlap you desire. tree is a regression tree clustering-based approach
overlap : float, optional: The percentage of overlap when using a fuzzy cluster method. Each cluster will be of the same size.
is_parallel : boolean, optional: A boolean switching parallel model fitting on. If it is True, then all the underlying Gaussian Process model will be fitted in parallel, supported by MPI. Otherwise, all the models will be fitted sequentially.

Attributes

cluster_label : the cluster label of the training set after clustering clusterer : the clustering algorithm used. models : a list of (fitted) Gaussian Process models built on each cluster.

Usage

Example code:

from OWCK import OWCK
owck_model = OWCK(cluster_method='tree')
owck_model.fit(X,y)
pred_y, var_y = owck_model.predict(x_new)

References

[SWKBE15]

Bas van Stein, Hao Wang, Wojtek Kowalczyk, Thomas Baeck and Michael Emmerich. Optimally Weighted Cluster Kriging for Big Data Regression. In 14th International Symposium, IDA 2015, pages 310-321, 2015 http://link.springer.com/chapter/10.1007%2F978-3-319-24465-5_27#