/Parallel-K-Means

A parallel implementation of K-Means algorithm in C++ and OpenMP.

Primary LanguageC++MIT LicenseMIT

Parallel K Means

A parallel implementation of K-Means algorithm in C++ and OpenMP.

The task of k-means is to divide points within a space in K groups based on their characteristics. It is based on few steps:

  1. generate N points;
  2. generate K centroids that represent K clusters;
  3. for each point, compute the distance between the point and all of the clusters and assign the point to the nearest cluster;
  4. update the centroid’s characteristics, specially its coordinates, based on the new points inside the cluster;
  5. repeat from 3 until reaching a maximum number of iterations or until the clusters won’t move.

Implementation

The implementation covres both sequential and parallel version, in C++ and OpenMP.

Sequential version

Two classes are defined:

  • Point.h
    • 2D points chosen randomly;
    • x_coord, y_coord, cluster_id;
    • setter() and getter() methods.
  • Cluster.h
    • 2D points chosen randomly at first;
    • x_coord, y_coord, size, new_x_coord, new_y_coord;
    • setter() and getter() methods and others to update the coordinates of the centroids.

The file main_sequential starts the sequential version:

  • in this program, points and clusters are 2D points chosen randomly;
  • the first functions initialize clusters and points;
  • after that, there is a while loop where are the 2 most important functions:
    • compute_distance();
    • update_clusters();

Parallel version

The program is mostly parallelizable in the compute_distance() function and we can parallelize the outer for. OpenMP was choosen in order to parallelize the computing:

  • Private variables for each thread: min_distance and min_index;
  • Firstprivate variables for each thread: points_size and clusters_size;
  • Shared variable: vector of points and vector of clusters;
  • since the amount for computation is equal for each thread, it was chosen a static scheduling.

Experiments and results

Here is a table that shows the speedup analysis:

Here an example of the execution of the program plotted with Gnuplot:

Reports

A copy of the report (italian) can be found here.

A copy of the presentation can be found here.

License

Licensed under the term of MIT License.