The aim of this project is to implement the k-means algorithm using Rust-lang. The source code includes a parallel implementation in Rayon.
Here are some key characteristics of the K-means algorithm:
- Initialization: The algorithm starts by randomly selecting K cluster centroids from the dataset.
- Assignment: Each data point is then assigned to the nearest centroid based on the Euclidean distance metric.
- Update: The centroids of each cluster are updated by taking the mean of all data points assigned to that cluster.
- Repeat: Steps 2 and 3 are repeated until convergence, that is, until the assignment of data points to clusters no longer changes.
- Optimal K: The choice of K, the number of clusters, can significantly impact the clustering results, and it is often determined using heuristics or optimization techniques.
If you want to create more or fewer points, you can use the "points_generator.rs" file located in the "bin" folder. Running the command below will generate points and store them in a "points.txt" file within the "inputs" folder.
cargo run --bin points_generator
The "examples" folder contains multiple implementations of the k-means algorithm, each of which differs from the others in some way.
cargon run --example parallel-iterations-2
The program will generate a plot and store it in the "outputs" folder.