A toy problem showing how to parallelise the SEAPODYM code along cohorts
You need:
- python 3.x
- MPI
- mpi4py
- defopt
Type
pip install mpi4py defopt
to install the packages. Then type
pip install -e .
to install cohort_parallel.
Specify the number of workers NW (= number of cores), the number of age groups (NA) and the number of time steps NT with the command
mpiexec -n NW python cohort_parallel/simulator.py --nt NT --na NA
By default, each task will take 0.015 sec per time step. A task can have up to NA time steps -- the worker takes other tasks to cover the number of time steps NT.
A full set of options is
python cohort_parallel/simulator.py -h
mpiexec -n 2 python cohort_parallel/simulator.py --na 4 --nt 8
The tables display the steps (rows) and the corresponding tasks executed by each worker (columns). For instance, worker 0 executes tasks 0 (step = 0-3) and task 7 (steps 4-7), as well as tasks 2 and 5.
Each step takes 0.015 seconds to execute. Since there are NT * NA steps, the total execution time 8 * 4 * 0.015 = 0.48 secs in this case. The wall clock time is 0.269 secs, which corresponds to a speedup of 0.48/0.269 = 1.787 for 2 processes.
Make sure you have "defopt" installed,
module load Python
pip install defopt --user
(You only need to do this once.)
Then type
sbatch slurm/simulator.sh
to submit the job. This will print something like
Submitted batch job JOBID
Record the job id JOBID returned by the above command. You can check progress of the job's execution with
squeue --me
Typing
tail cohort_parallel_JOBID.out
(replacing JOBID with above job ID number) will returned something like
Elapsed time: 1.67 secs
Speedup: 33.18x (best case would be 37)
You can adjust the number of workers (and other SLURM options) by passing "--ntasks NUM_WORKERS" to the "sbatch" command. The number of time steps (NT), the number of age groups (NA) and the number of data values to exchange between each pair of workers can be set by passing the "-t NT", "-a NA" and "-d NDATA" options to the SLURM script. For instance,
sbatch --ntasks=50 --nodes=2 slurm/simulator.sh -t 200 -a 100 -d 10000