/dcbrain

dcbrain

Apache License 2.0Apache-2.0

dcbrain

We release two datasets that are collected at Alibaba:

  • Hard drive disks (HDDs) (diskdata/): It includes over 200 thousand HDDs in Alibaba Cloud's data centers.

    • Publication: "Large-Scale Disk Failure Prediction(book)."
      Cheng He, Mengling Feng, Patrick P. C. Lee, Pinghui Wang, Shujie Han, Yi Liu.
      PAKDD 2020 Competition and Workshop, AI Ops 2020, February 7 – May 15, 2020, Revised Selected Papers
  • Solid-state drives (SSDs) (ssd_open_data/): It includes nearly one million SSDs of 11 drive models from three vendors over a two-year span.

    • Publication: "An In-Depth Study of Correlated Failures in Production SSD-Based Data Centers."
      Shujie Han, Patrick P. C. Lee, Fan Xu, Yi Liu, Cheng He, and Jiongzhou Liu.
      Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST 2021), February 2021.