Miner

Miner is a toy library for data mining. The main goal of this library is to provide an introduction to different data mining techniques while learning on the subject myself.

K-means clustering

A simple yet powerful algorithm for cluster analysis is the k-means algorithm. This algorithm will partition a set of points over k clusters. After the algorithm has converged, the clusters property of the kmeans objects (kmeans.clusters) will contain a dictionary with indexes that refer to the elements in space.point.

import miner.utils
import miner.clustering

space = miner.utils.Space()
space.points([(2, 2), (2, 1), (2, 3), (2, -2), (2, -1), (2, -3)])

kmeans = miner.clustering.KMeans(2, space)
kmeans.converge()

K-nearest neighbor

If you want to classify a certain record by comparing it to training data, you can employ a classification algorithm, like KNN. This simple algorithm calculates the distance between the record to be classified and the training records. The algorithm will return the class label which is most common for the k nearest records.

import miner.utils
import miner.classification

# Set up our matrix for records with 4 attributes
matrix = miner.utils.Matrix(4)
matrix.records([[1.0, 2.3, 5.0, 3.0],
                [2.0, 4.0, 1.2, 1.8],
                [15.0, 12.2, 13.0, 1.1],
                [10.0, 9.4, 8.4, 1.],
                [-10.0, 3.2, 1.6, -1.0],
                [-1.0, 2.2, 1.2, -3.0]])
# Associate the class label with the records
matrix.classes(['A', 'A', 'B', 'B', 'C', 'C'])
matrix.normalize()

# Find the class label for the following record, for k = 3
knn = miner.classification.KNearestNeighbor(3, matrix)
print knn.classify([1.2, 4.2, 4.2, -1.2])

License

This library is released under the MIT license.

joelcox/miner

Miner

K-means clustering

K-nearest neighbor

License