/data-release

This is the repository of the Huawei Public Cloud and Huawei Private Cloud datasets.

Primary LanguageJupyter Notebook

Huawei Public Cloud and Huawei Private Cloud data release

This is the repository for How Does It Function? Characterizing Long-term Trends in Production Serverless Workloads published at ACM SoCC 2023.

The paper analyzes the Huawei Public Cloud and Huawei Private Cloud datasets, which are available for download below.

We also provide two Jupyter Notebooks that show how to load the data as a Pandas DataFrame and make plots.

Download/read our paper

Conference video presentation

How to download the data

The datasets used in our paper can be downloaded here:

Huawei Private

This dataset contains 141 days (collected over 235 days) for 200 functions from all availability zones combined of our private cloud.

Metric Minute Second Description
Function ID - - Unique function identifier out of 200 (0-199)
Timestamp - - Timestamp in seconds (0-20303940)
Requests Requests per minute Requests per second Number of function invocations
Function delay Function delay per minute Function delay per second Function execution time averaged over all pods running that function
Platform delay Platform delay per minute Platform delay per second Platform delay is scheduling time and some network overheads; averaged over all pods running that function
CPU usage CPU usage per minute N/A Percentage of allocated CPU used per function averaged over all pods
Memory usage Memory usage per minute N/A Percentage of allocated memory used per function averaged over all pods
CPU limit CPU limit per minute N/A Allocated CPU for all pods running that function (normalized)
Memory limit Memory limit per minute N/A Allocated memory for all pods running that function (MB)
Instances Instances per minute N/A Number of pods allocated to that function

Note: For Huawei Private, requests, function delay, and platform delay are originally expressed per second. We provide aggregated per-minute versions of these metrics for convenience. Requests per minute are obtained by summing requests per second every 60 seconds. Function and platform delay per minute are obtained by taking the mean every 60 seconds.

Huawei Public

This dataset contains 26 days for 5093 functions from one availability zone of our public cloud.

Metric Minute Description
Function ID - Unique function identifier out of 5093 (0-5092)
Timestamp - Timestamp in seconds (0-2246340)
Requests Requests per minute Number of function invocations

After downloading the data, the folder structure should look like this.

.
├── demo_private.ipynb
├── demo_public.ipynb
└── datasets
    ├── private_dataset
    │   ├── cpu_limit_minute
.
├── demo_private.ipynb
├── demo_public.ipynb
└── datasets
    ├── private_dataset
    │   ├── cpu_limit_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ... 
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── cpu_usage_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── function_delay_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── function_delay_second
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── instances_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── memory_limit_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── memory_usage_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── platform_delay_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── platform_delay_second
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   ├── requests_minute
    │   │   ├── day_000.csv
    │   │   ├── day_001.csv
    │   │   ├── ...
    │   │   ├── day_233.csv
    │   │   └── day_234.csv
    │   └── requests_second
    │       ├── day_000.csv
    │       ├── day_001.csv
    │       ├── ...
    │       ├── day_233.csv
    │       └── day_234.csv
    └── public_dataset
        └── requests_minute
            ├── day_00.csv
            ├── day_01.csv
            ├── ...
            ├── day_24.csv
            └── day_25.csv