The aim of this repository is to provide the code to reproduce the results of the following work:
Sukprasert, Thanathorn and Souza, Abel and Bashir, Noman and Irwin, David and Shenoy, Prashant, "On the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud", in the 19th European Conference on Computer Systems (EuroSys)
In this work, we conduct a detailed trace-driven analysis to understand the benefits and limitations of spatiotemporal workload scheduling for cloud workloads with different characteristics, e.g., job duration, deadlines, etc., based on hourly variations in energy's carbon-intensity over three years across 123 distinct regions, which encompass most major cloud sites. For more information, please refer to the paper.
This repository is also available on Zenodo
- Ubuntu 20.04+
- Python 3.8+
We have run this experiment with in the above setting. However, this codebase should work on any Unix based system with Python 3.8+ installed.
In addition, we have generated a requirements.txt for the required Python modules.
We suggest you to create and load a Python virtual environment and install modules inside of this virtual environment.
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
In the direction that you want to run:
Create a virtual environment:
python3 -m venv .venv
where the .venv is the name of the virtual environment
To activate the virtual environment
source .venv/bin/activate
To install the requirements
pip3 install -r requirements.txt
To deactivate the virtual environment
deactivate
-
Carbon Intensity:
-
Downloading carbon intensity data: https://www.electricitymaps.com/data-portal
- To download the data for each region, select the region then select 'Get data'
- At the 'Year' filter, select the year
- At the 'Interval' select
hourly
- To download the file for each zone in the region click the download button.
- Enter the information and click 'Submit', the click 'Download data'
- Place the downloaded data in the downloaded_carbon_data directory
- Process the downloaded data in process_raw_data directory
-
Alternatively, the carbon intensity data from year 2020-2022 are provided in the provided_data/combined_carbon.csv directory
-
If the downloaded data is used, the processed data from the process_raw_data/process_data.py directory will be saved to the shared_data directory.
-
Otherwise, copy the provided_data/combined_carbon.csv to the shared_data directory.
-
-
Google Latency: https://lookerstudio.google.com/reporting/fc733b10-9744-4a72-a502-92290f608571/page/p_854mo2jmcd (from the AT&T Center for Virtualization at Southern Methodist University)
-
Google Trace: https://github.com/google/cluster-data (version 3)
-
Azure Trace: https://github.com/Azure/AzurePublicDataset (V2)
process_raw_data: This directory contains scripts to process the raw carbon intensity and latency data.
shared_data: This directory is where the processed carbon intensity data will be saved or copied to (See the instruction above). This directory also contains a sample of raw and processed Google latency matrix as well as the emission factors from Electricity Maps.
All the raw and processed data are stored in this directory. The data is shared across multiple experiments.
The directories that starts with sim_
are the directories with different groups of simulations.
The table below shows the directory names that start with sim
, their sub-directories that contain the simulations as well as the respective figures.
Main Simulation Diretory Name | Sub-directories | Figure(s) |
---|---|---|
sample_trace | carbon_trace, energy_mix | 1(a)-(b) |
trace_analysis | mean_and_cv, change_over_time, periodicity | 3(a)-(b), 4 |
spatial | geo_grouping_capacity, global_idle_capacity, capacity_and_latency, one_and_inf | 5(a)-(c), 6(a)-(b) |
temporal | deferrability, interruptibility, deferrability_and_interruptibility_combined, job_length_distribution, vary_slack | 7(a)-(b), 8(a)-(b), 9(a)-(b), 10(a)-(b) |
what_ifs | mixed_workload, prediction_error, greener, temporal_spatial_combined | 11(a)-(d), 12 |
Each simulation is run inside its own sub-directory.
To access the main simulation directory:
cd sim_<directory name>
To access the simulation directory
cd <simulation sub-directotry name>
In the simulation directory, to run any script:
python3 <file_name>
The calculated result for each simulation will be stored in their own data_output
directory and the plots of each calculated result will be stored in their own plot_output
directory.
For example to calculate mean and CV:
- From the main directory, go to the trace_analysis directory by
cd sim_trace_analysis
- In the trace_analysis directory go to mean_and_cv directory by
cd mean_and_cv
- In the mean_and_cv directory, to calculate mean and CV of the carbon intensity signal, run
python3 calculate_mean_and_cv.py
-
The shared_data directory contains data that are sharedacross multiple experiments.
-
global_modules: Contains helper functions, flag images, and fonts.
-
The flag images are from: https://github.com/gosquared/flags/blob/master/flags/flags-iso/flat/16/US.png
- The Python codebase available in here follows the Apache v2 License unless otherwise stated.
- The Google Latency dataset has been created by the from the AT&T Center for Virtualization at the Southern Methodist University and follows the Apache v2 License.
- The Electricity Maps Carbon Intensity is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/