Machine learning has become particularly popular in the recent years. Computers are able to perform complex tasks without human interference. We wanted to introduce the basics of machine learning.
For our project, we have decided to build data visualizations of a machine learning algorithm called K-means clustering. We want to provide an intuitive visual for this popular algorithm. The goal of this algorithm is to find groups in the data, with k number of groups. This algorithm works iteratively to assign data points to a group based on similarities. K-means clustering is an example of unsupervised learning, where data has not been explicitly labeled.
Clustering is often used in the industry to study user purchase behavior or group images and videos.
- Users can scroll through page to view visualizations of K-means clustering
- Explains how the K-means clustering algorithm is implemented
- Animated visualizations using D3.js and X3Dom.js
Our webpage relies on D3 to render visualizations of the algorithm. The D3 library allows for data to be displayed dynamically. This library uses HTML, CSS, and SVG to manipulate the DOM. Below is an example of how we used D3 to update the position of the centroids for our 2D visualization.
function update() {
let data = points.concat(centroids);
let circle = group.selectAll("circle")
.data(data);
circle.enter().append("circle")
.attr("id", function(d) { return d.id; })
.attr("class", function(d) { return d.type; })
.attr("r", 5);
circle.transition().delay(10).duration(100)
.attr("cx", function(d) { return d.x; })
.attr("cy", function(d) { return d.y; })
.style("fill", function(d) { return d.fill; });
circle.exit().remove();
}
K-means Clustering algorithm calculates the distance between each data point and its centroid. The data point is then assigned to the closest centroid, resulting in clusters. This algorithm works iteratively until a maximum number of iterations is reached. We then render the result using D3.js.
function moveCentroids() {
centroids.forEach(function(d) {
let cluster = points.filter(function(e) {
return e.fill === d.fill;
});
let center = computeClusterCenter(cluster);
d.x = center[0];
d.y = center[1];
});
}
function findClosestCentroid(point) {
let closest = {i: -1, distance: width * 2};
centroids.forEach(function(d, i) {
let distance = getEuclidianDistance(d, point);
if (distance < closest.distance) {
closest.i = i;
closest.distance = distance;
}
});
return (centroids[closest.i]);
}
Principal Component Analysis (PCA) is a technique used to transform a high-dimensional dataset into a lower-dimensional subspace prior to running a machine learning algorithm on the data. It makes data easy to describe and visualize.
This library allows DOM manipulation in a 3D space. We are able to animate the K-means algorithm using 3D scene. The scene can be dragged to be viewed at different angles in space.
ScrollMagic.io allows animations or functions to be invoked based on a scroll trigger element. We set the trigger element to be a certain HTML element. When the user scrolls and reaches that element on the page, the function is invoked.
let twoD = new ScrollMagic.Scene({
triggerElement: "#kmeans",
triggerHook: 'onEnter',
duration: 200
}).on('start', function(e){
if (e.scrollDirection == "FORWARD" && count === 0){
kMeans("#kmeans", 400, 400, 150, 3, 15);
count += 1;
}
})
.addTo(ctrl);
}
UCI Machine Learning Repository
We want to use our algorithm to visualize a practical example (i.e customer segmentation or social circles)