This repository contains a representative subset of the first-party virtual machine workload (VM) of Microsoft Azure in one of its geographical regions. The trace is a sanitized subset of the Azure VM workload described in "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms" in SOSP’17. We include in this repository a jupyter notebook that directly compares the main characteristics of the two traces, showing that they are qualitatively very similar.
We provide the trace as is, but are willing to help researchers understand and use it. So, please let us know of any issues or questions by sending email to our mailing list.
If you do use this trace in your research, please make sure to cite our SOSP’17 paper (mentioned above).
The trace contains a representative subset of the first-party Azure VM workload in one geographical region. The main trace characteristics and schema are:
- Dataset size: 117GB
- Compressed dataset size: 78.5GB
- Number of files: 128 files
- Duration: 30 consecutive days
- Total number of VMs: 2,013,767
- Total number of Azure subscriptions: 5,958
- Timeseries data: 5-minute VM CPU utilization readings, VM information table and subscription table (with main fields encrypted)
- Total VM hours: 104,371,713
- Total number of VM CPU utilization readings: 1,246,539,221
- Total virtual core hours: 237,815,104
- Encrypted subscription id
- Encrypted deployment id
- Timestamp in seconds (starting from 0) when first VM created
- Count VMs created
- Deployment size (we define a “deployment” differently than Azure in our paper)
- Encrypted VM id
- Timestamp VM created
- Timestamp VM deleted
- Max CPU utilization
- Avg CPU utilization
- P95 of Max CPU utilization
- VM category
- VM virtual core count
- VM memory (GBs)
- Timestamp in seconds (every 5 minutes)
- Min CPU utilization during the 5 minutes
- Max CPU utilization during the 5 minutes
- Avg CPU utilization during the 5 minutes
You can download the dataset from Azure Blob Storage using the links available here.
Please let us know of any issues or questions by sending email to our mailing list.
This trace derives from a collaboration between Azure and Microsoft Research.