abhinav-vaishya/Parallel-Computing---MPI-OpenMP-CUDA

C++

Introduction to Parallel Scientific Computing - Assignment 3

-All codes are in codes/ folder

Question 2

the directive collapse(3) causes the variable L to be non private within each thread. This causes a problem as the operations are no longer atomic and can now cause inconsistency in the result.
to fix this we need to write collapse(4) instead of collapse(3).

Question 3

All codes work correctly, although openmp code doesn't work over arrays greater than size of 1000
Cuda code is fastest among all, it takes 3s for array of size 1e8

Question 4

for n = 1000 and threads = 20
Error for serial implemenatation = 0.18649
Runtime for serial code : 11335174 ms
Error for parallel implemenatation = 0.18649
Runtime for parallel code : 1551693 ms

Question 5

Using only one kernel, we are able to compute all values as min, max, mean, std are shared variables so computing min and max is easy and direct
for mean keep adding the values and at the end of code just divide by total number of samples
for std keep track of sum of squares of each value and then use the formula std = sqrt( (sum_of_squares)/N - mean^2)
time taken for execution of 1e8 size array 0.13s

Question 6

paralellised the serial implementation given.