Reproducing Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Distributed Data Processing Systems, assignment 1: reproducibility study
by András Schmelczer & Leonardo Pohl
Leiden University, 2021
module load python/3.6.0
python3 -m venv --copies .env
pip install -r requirements.txt
./start.sh
Wait until the master node has started. Will run for a while, on
KeyboardInterrupt
it cleans up.
In a separate terminal, run:
python3 run_experiment.py
python3 process_experiment.py
python3 chart_experiment_1.py
python3 chart_experiment_2.py
CDFs of running times of jobs in various bin ranges in the IO-heavy workload. Fair sharing greatly improves performance for small jobs, at the cost of slowing the largest jobs. Delay scheduling further improves performance, especially for medium-sized jobs.
Average speedup of delay scheduling over naive fair sharing for jobs in each bin in the IO-heavy workload. The black lines show standard deviations.
Average speedup of delay scheduling over FIFO scheduling sharing for jobs in each bin in the IO-heavy workload. The black lines show standard deviations.