/decarbonization-potential

CO2 emission and runtime analysis of different scheduling strategies

Primary LanguagePythonApache License 2.0Apache-2.0

Carbon Savings Upper Bound Analysis

The aim of this repository is to provide the code to reproduce the results of the following work:

Sukprasert, Thanathorn and Souza, Abel and Bashir, Noman and Irwin, David and Shenoy, Prashant, "On the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud", in the 19th European Conference on Computer Systems (EuroSys)

In this work, we conduct a detailed trace-driven analysis to understand the benefits and limitations of spatiotemporal workload scheduling for cloud workloads with different characteristics, e.g., job duration, deadlines, etc., based on hourly variations in energy's carbon-intensity over three years across 123 distinct regions, which encompass most major cloud sites. For more information, please refer to the paper.


This repository is also available on Zenodo

Requirements

  • Ubuntu 20.04+
  • Python 3.8+

We have run this experiment with in the above setting. However, this codebase should work on any Unix based system with Python 3.8+ installed.

In addition, we have generated a requirements.txt for the required Python modules.
We suggest you to create and load a Python virtual environment and install modules inside of this virtual environment.

Python Modules

  • pandas
  • numpy
  • scikit-learn
  • matplotlib
  • seaborn

Getting Started

1. Creating Virtual Environment

In the direction that you want to run:

Create a virtual environment:

python3 -m venv .venv

where the .venv is the name of the virtual environment

To activate the virtual environment

source .venv/bin/activate

To install the requirements

pip3 install -r requirements.txt

To deactivate the virtual environment

deactivate

2. Raw Data Sources

Raw Data Processing

process_raw_data: This directory contains scripts to process the raw carbon intensity and latency data.

shared_data: This directory is where the processed carbon intensity data will be saved or copied to (See the instruction above). This directory also contains a sample of raw and processed Google latency matrix as well as the emission factors from Electricity Maps.

All the raw and processed data are stored in this directory. The data is shared across multiple experiments.

3. Running Experiments

The directories that starts with sim_ are the directories with different groups of simulations.

The table below shows the directory names that start with sim, their sub-directories that contain the simulations as well as the respective figures.

Main Simulation Diretory Name Sub-directories Figure(s)
sample_trace carbon_trace, energy_mix 1(a)-(b)
trace_analysis mean_and_cv, change_over_time, periodicity 3(a)-(b), 4
spatial geo_grouping_capacity, global_idle_capacity, capacity_and_latency, one_and_inf 5(a)-(c), 6(a)-(b)
temporal deferrability, interruptibility, deferrability_and_interruptibility_combined, job_length_distribution, vary_slack 7(a)-(b), 8(a)-(b), 9(a)-(b), 10(a)-(b)
what_ifs mixed_workload, prediction_error, greener, temporal_spatial_combined 11(a)-(d), 12

Each simulation is run inside its own sub-directory.

To access the main simulation directory:

cd  sim_<directory name>

To access the simulation directory

cd <simulation sub-directotry name>

In the simulation directory, to run any script:

python3 <file_name>

The calculated result for each simulation will be stored in their own data_output directory and the plots of each calculated result will be stored in their own plot_output directory.

For example to calculate mean and CV:

  1. From the main directory, go to the trace_analysis directory by
cd sim_trace_analysis
  1. In the trace_analysis directory go to mean_and_cv directory by
cd mean_and_cv
  1. In the mean_and_cv directory, to calculate mean and CV of the carbon intensity signal, run
python3 calculate_mean_and_cv.py

Other Sources

License

  • The Python codebase available in here follows the Apache v2 License unless otherwise stated.
  • The Google Latency dataset has been created by the from the AT&T Center for Virtualization at the Southern Methodist University and follows the Apache v2 License.
  • The Electricity Maps Carbon Intensity is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/