NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction

This is the Python implementation for experiments in NURD: Negative-Unlabeled Learning for Online Datacenter Straggler Prediction.

Requirements

Data

The data folder includes code for preprocessing Google and Alibaba trace data.

Google trace data

Google trace data is a collection of data that is collected by Google's infrastructure. This data can be used to track the performance of Google's services, identify bottlenecks, and improve the overall user experience.
Google trace dataset can be downloaded from - https://github.com/google/clusterdata

The trace data is collected from a variety of sources, including:

Applications: Google's applications, such as Gmail, YouTube, and Google Search, generate trace data that can be used to track the performance of these applications.
Infrastructure: Google's infrastructure, such as data centers and networks, also generate trace data that can be used to track the performance of these systems.
Users: Google's users also generate trace data, such as the websites they visit, the videos they watch, and the searches they perform.

Alibaba trace dataset

Alibaba trace dataset can be downloaded from - https://github.com/alibaba/clusterdata
The Alibaba Cluster Trace Program is published by Alibaba Group. Cluster-trace-v2017 includes about 1300 machines in a period of 12 hours. The trace-v2017 firstly introduces the collocation of online services (aka long running applications) and batch workloads.

Code

run_ts.py includes implementations for the following methods:

Base (gb): A Basic learner trained on observed tasks and predict stragglers on unseen tasks. Use gradient boosting tree model.
LR (log): A logistic regression model trained on observed tasks and predict stragglers on unseen tasks.
OS (os): A complete solution for straggler prediction using linear support vector machines and oversampling to account for a lack of stragglers in training proposed in Wrangler.
DS (ds): A variant of the above but using downsampling instead.
SS-EN (en): A semi-supervised learning method proposed by Elkan and Noto.
SS-BG (bg): A bagging-based semi-supervised learning method by Mordelet and Vert.
Tobit (tb): Tobie regression model for censored data.
Grabit (kt): Gradient tree-boosted Tobit model by Sigrist and Hirnschall.
IPW (gb-ipw): proposed NURD in the paper.

run_od.py includes implementations for all the outlier detection methods.

adityagoel28/Big_Data_NURD