Alibaba Block Traces

These traces are published by Alibaba Group to help researchers understand the real-world workload in the cloud.

They are collected from a cluster in production of the elastic block service of Alibaba Cloud (i.e. storage for virtual disks). The cluster is located in Beijing region, one of the most popular regions of Alibaba Cloud.

There are 1000 virtual disks randomly sampled from that cluster, and all their I/O activities are recorded over the month of January 2020. These virtual disks are ultra disk products. Ultra disks are backed by a storage cluster and offer high data reliability. Ultra disks are cheaper and offer lower random I/O performance, compared to standard SSD and enhanced SSD disks link. Typical applications of ultra disks are running operating systems, big data processing software, web servers, etc..

Download

The data are available for download from Alibaba OSS. You will get the download link after taking a short survey. If you have any questions or ideas about the trace data, feel free to contact us. The current maintainer is Chao Shi <chao.shi AT alibaba-inc.com>. We are happy to see research work based on the trace data.

Just a kind reminder, the tarball is very large, 181GB gzip-compressed and 751GB uncompressed, so make sure you have enough space on your disk.

Here are MD5 checksums of the tarball and files inside.

Filename MD5 checksum
alibaba_block_traces_2020.tar.gz 95780fc531a60fd4ca0513ef88ef469c
io_traces.csv c60dd8f771738d4d8df56271e56dd308
device_size.csv 6641abe8a0f3625f13776120d2884e84

Schema

There are two files in CSV format. Their file format is defined as follow.

io_traces.csv

Each row is a read or write operation.

Column Type Example Description
device_id uint32 0 ID of the virtual disk
opcode char R Either of 'R' or 'W', indicating this operation is read or write
offset uint64 126703644672 Offset of this operation, in bytes
length uint32 4096 Length of this operation, in bytes
timestamp uint64 1577808000000626 Timestamp of this operation received by server, in microseconds

device_size.csv

Each row is a device with is capacity.

Column Type Example Description
device_id uint32 0 ID of the virtual disk
capacity uint64 536870912000 Capacity of the virtual disk, in bytes

All IDs of virtual disks are re-mapped to the range of 0 - 999.

Research outcome

Here is a list of research work based on the trace data. If your paper uses the data, it would be great to let us know and add your work to this list.

Alibaba Innovative Reseach (AIR) program sponsors research every year on various area in computer science that solve the real problems in industry scenarios. If you have fancy ideas and are interested in participating in this program, feel free to contact Chao Shi <chao.shi@alibaba-inc.com>.

Acknowledgements

Thanks to Qiuping Wang and Jinhong Li from the Chinese University of Hong Kong for analyzing and validating the data at an early stage.

License

The trace data and document are licensed under CC-4.0.