Write a program that will calculate the Pi number value with a Leibniz formula. The program takes a number of iterations as a parameter, executes and prints out the computed Pi value. Write two versions of the program:
serial version using standard C; parallel version using POSIX threads (it should take the number of threads as a program argument). Test the serial and parallel programs on a multi-core/mutli-processor machine. Pick a number of iterations that will allow the serial version to run for few (2-5) seconds. Compare the performance of the POSIX thread version tested with different number of threads (e.g. 1, 2, 4 and 8) with the performance of the serial version. The 'time' shell command can be used to measure the execution times.
-
Step 1
gcc -o serial_pi serial_pi.c gcc -o parallel_pi parallel_pi.c
-
Step 2:
-
Step 2.1
time ./serial_pi 100000000
-
Step 2.2
time ./parallel_pi 100000000 1
where the first arg is the number of loops to calculate Pi and the second arg is the number of threads.
-
Let's break down the results for both the serial and parallel versions of the program:
time ./serial_pi 100000000
Computed Pi value: 3.141592643589326
./serial_pi 100000000 0,37s user 0,00s system 40% cpu 0,904 total
time ./parallel_pi 100000000 4
Computed Pi value: 3.141592643589817
./parallel_pi 100000000 4 0,38s user 0,00s system 113% cpu 0,332 total
time ./parallel_pi 100000000 2
Computed Pi value: 3.141592643590251
./parallel_pi 100000000 2 0,41s user 0,00s system 197% cpu 0,209 total
time ./parallel_pi 100000000 1
Computed Pi value: 3.141592643589326
./parallel_pi 100000000 1 0,37s user 0,00s system 98% cpu 0,379 total
time ./parallel_pi 100000000 8
Computed Pi value: 3.141592643589880
./parallel_pi 100000000 8 0,47s user 0,00s system 671% cpu 0,071 total
- user: The amount of CPU time spent in user mode.
- system: The amount of CPU time spent in kernel mode.
- cpu: The percentage of the CPU's total capacity used during the execution.
- total: The total elapsed time (wall-clock time) for the program to complete.
- Computed Pi value: 3.141592643589326
- Time metrics:
0.37s user
: The program spent 0.37 seconds of CPU time.0.00s system
: The program spent negligible time in kernel mode.40% cpu
: The program used 40% of the CPU's capacity.0.904 total
: The total elapsed time was 0.904 seconds.
- Computed Pi value: 3.141592643589817
- Time metrics:
0.38s user
: The combined user time across all threads was 0.38 seconds.0.00s system
: The program spent negligible time in kernel mode.113% cpu
: The program used 113% of the CPU's capacity (indicating that multiple CPU cores were utilized).0.332 total
: The total elapsed time was 0.332 seconds.
- Computed Pi value: 3.141592643590251
- Time metrics:
0.41s user
: The combined user time across all threads was 0.41 seconds.0.00s system
: The program spent negligible time in kernel mode.197% cpu
: The program used 197% of the CPU's capacity (indicating that multiple CPU cores were utilized).0.209 total
: The total elapsed time was 0.209 seconds.
- Computed Pi value: 3.141592643589880
- Time metrics:
0.47s user
: The combined user time across all threads was 0.47 seconds.0.00s system
: The program spent negligible time in kernel mode.671% cpu
: The program used 671% of the CPU's capacity (indicating that multiple CPU cores were utilized).0.071 total
: The total elapsed time was 0.071 seconds.
- Computed Pi value: 3.141592643589326
- Time metrics:
0.37s user
: The program spent 0.37 seconds of CPU time.0.00s system
: The program spent negligible time in kernel mode.98% cpu
: The program used 98% of the CPU's capacity.0.379 total
: The total elapsed time was 0.379 seconds.
-
Accuracy: All versions (serial and parallel) produced Pi values that are very close to each other, indicating that the parallelization did not affect the computation accuracy much.
-
Performance:
- The serial version took the longest time to complete (
0.904 seconds
). - The parallel version with 8 threads performed the best, taking only
0.071 seconds
. - As the number of threads decreases, the performance gets closer to the serial version, with 1 thread essentially matching the serial version's performance (
0.379 seconds
).
- The serial version took the longest time to complete (
-
CPU Utilization:
- The serial version used only
40%
of the CPU's capacity, indicating it was not fully utilizing the available CPU cores. - The parallel version with 4 threads used
113%
of the CPU's capacity, showing that it utilized more than one core effectively. - The parallel version with 2 threads used
197%
of the CPU's capacity, indicating it effectively used two cores. - The parallel version with 8 threads used
671%
of the CPU's capacity, indicating it effectively used two cores.
- The serial version used only
- Scalability: The parallel version scales well with the number of threads, reducing the total elapsed time significantly when using multiple threads.
- Optimal Threads: The optimal number of threads seems to be 4 or 8. More threads might yield diminishing returns due to overhead or limitations in parallelization.
- Anotations: When the number of loops is incremented the accuracy improves; but so does the time.
- Jorge Ortega Izquierdo
- Adrián Anta Gil