/Game-Of-Life

Efficient Parallel code in MPI, MPI+OpenMP and CUDA for Game of Life

Primary LanguageCMIT LicenseMIT

Game-Of-Life

About GoL

John Conway came up with the Game of Life in 1970. The game demonstrates the fact that some simple local rules can lead to interesting large-scale life behavior(birth, reproduction and death). The game is played in a 2 dimensional grid N x N, made of cells, that can be either alive, or dead. The game does not have any players, thus it does not require any input by the user. Each cell has at most 8 neighbours, that determine its state in the next generation. The re-formation of the grid from generation to generation is done simultaneously, meaning that each state in the next generation depends exclusively in the state of the cell and its neighbours. Our goal is to implement this game, using Parallel programming. This program is a pure example of the value of parallelization, because of the grid, and the ability to split it to equal pieces.

Implementation

Our code for this assignment is splitted into many files. We have a directory which contains common files used by all the methods that we have developed. For each method (MPI, OpenMP, CUDA), we have a directory containing all the useful files. In the directory misc, there are various input and output files, while in the directory plots there are vrious plots regarding our time achievements while running this program in a supercomputer.

MPI - OpenMP

The program takes as input a (N x N grid, which is provided in a simple text file. The data (0s and 1s) must be all in one row. So, if for example we want a 4 x 4 grid, we are going to provide a file containing 16 characters.

Our goal was to design the MPI code by following the given instructions. Our main objectives were:

  • Reduction of idle time
  • Reduction of unnecessary computations
  • Parallelization of a great portion of the program
  • Use of datatypes

CUDA

CUDA is a platform designed jointly at software and hardware levels to make use of the GPU computational power in general-purpose applications at three levels:

  • Software: It allows to program the GPU with minimal but powerful SIMD extensions to enable heterogeneous programming and attain an efficient and scalable execution.
  • Firmware : It offers a driver oriented to GPGPU programming, which is compatible with that used for rendering. Straightforward APIs to manage devices, memory, etc.
  • Hardware: It exposes GPU parallelism for general-purpose computing via a number of multiprocessors endowed with cores and a memory hierarchy.

In this project, we will use the GPUs provided in ARGO in order to measure our timings for our CUDA implementation. Before the metrics though, we are going to briefly explain our code for CUDA, which can be found in cuda/gol.cu.

Benchmarking

The most important aspect when using parallel computing, is the time benefits that we get. There are a few metrics that we are taught, that can show whether or not the application of parallelism was successful. Those metrics are:

  • Parallel Run Time: he time that elapses from the moment that a parallel computation starts to the moment that the last processor finishes execution.
  • Speedup: The ratio of the serial runtime of the best sequential algorithm for solving a problem to the time taken by the parallel algorithm to solve the same problem on p processors.
  • Efficiency: The ratio of speedup to the number of processors. Efficiency measures the fraction of time for which a processor is usefully utilized.

Those metrics for the MPI code without the allreduce parameter are plotted as following:

alt text

alt text

alt text

alt text

Regarding our CUDA implementation, during our runs using 1 and 2 GPUs, those are the results for the parallel run time:

alt text

The full report is available here

Compilation and running for MPI

  • While in the Project folder Compile just by typing make
  • Cd to build folder and Run mpiexec -n <processes> ./main.x -l <loops> -n <grid_size> -i <input_file> -r <AllReduce_flag>

Compilation and running for Hybrid: MPI+OpenMP

  • Edit Makefile by adding flag -fopenmp
  • While in the Project folder Compile just by typing make
  • Cd to build folder and Run mpiexec -n <processes> ./main.x -l <loops> -n <grid_size> -i <input_file> -r <AllReduce_flag>

Compilation and running for CUDA

  • While in the cuda folder Compile just by typing nvcc -o executable gol.cu
  • Run by passsing the correct parameters to the executable produced by compilation

Note

For our experiments we used a cluster named Argo which runs the executables by submiting jobs via qsub. If you have access to a similar cluster you can definitely use our scripts for your ease!

Project implemented by

Nikos Galanis
Pantelis Papageorgiou