
If you use Sinan in your research, please cite our ASPLOS'21 paper.

author = {Yanqi, Zhang and Weizhe, Hua and Zhuangzhuang, Zhou and G. Edward, Suh and Christina, Delimitrou
title = {Sinan: ML-Based & QoS-Aware Resource Management for Cloud Microservices},
booktitle = {Proceedings of the Twenty-Sixth International Conference on Architectural Support for Programming Languages and Operating Systems},
series = {ASPLOS '21}


  • Python 3.5+
  • Python 2.7 (for plotting)
  • Install & set up Google Cloud SDK ( In order to reproduce the results presented in the paper, the CPU quota (Compute Engine API) of your Google Cloud project should be no less than 500.

Code structure


benchmarks directory contain the source codes of tested benchmarks.

For SocialNetwork application (benchmarks/socialNetwork-ml-swarm), we added two compute-intensive machine learning microservices (text-filter and media-filter), and also add image data to user posts (previously posts only include text), in order to make the application a little more representative than original versions. We also provide a warm up script (benchmarks/socialNetwork-ml-swarm/ for the application, to fill the social-network friendship graph, and to fill the databases with posts


configuration files of cluster, scheduling actions, and inference engine


scripts to generate config files


scripts for running experiments


code for workload generation


ml models and scirpts for data preparation, training, deployment and fine-tunning (for different user workload patterns). The complete flow includes the following steps:

  • collect training data (short cut script in exp_scripts/
  • process collected data with
  • train the CNN & XGBoost model with processed data ( & The architectures of neural networks are in the model directory.
  • deploy online with running microservices (
  • fine tune the model to adapt to changes, cluster changes, workload skews e.g. (


utilization functions


initialization scripts for GCE VMs

root directory --- master for data collection --- master for running deployment experiment of the social network benchmark --- master for running deployment experiment of the social network benchmark with diurnal request per second (rps) pattern --- slave for data collection & deployment --- set up gcloud cluster & collect data --- set up glcoud cluster & deploy the social network benchmark --- set up gcloud cluster & deploy the social network benchmark with diurnal rps pattern


Following instructions assume that users start from git root directory. Before executing any shell script, users should make sure to clone the repo to their home directory and change the '--username' argument in the shell script to his own Google Cloud user name. When execution of scripts is completed, system execution log should be in the logs directory of the master node. Users can copy the data to local machine with scp (the ssh keys are automatically generated and stored in keys directory)

Deployment experiment with static rps and identical workload

This script tests the deployment of Sinan under static RPS, with the workload composition the same as training data (w0 in Figure 13 and Figure 14 in the paper). For detailed information on workload characterization, please check locust/src/

cd exp_scripts

Deployment experiment with static rps and skewed workload

This script tests the deployment of Sinan under static rps, with the workload composition slightly skewed from training data (w1-w3 in Figure 13 and Figure 14 in the paper). For detailed information on workload characterization, please check locust/src/social_rps_10_v*.py

cd exp_scripts
./ x

x can be set to 1, 2 and 3.

Deployment experiment with diurnal rps pattern

This script tests the deployment of Sinan under a diurnal rps pattern

cd exp_scripts

Data processing

python data_proc/ dir_name

This script calculates the average cpu usage, tail latencies & violation rates of the execution logs.

python data_proc/ dir_name

This script plots the real-time cpu allocation of each service and end-to-end latencies.

dir_name should be the directory that contains system execution log. For example, logs/deploy_data in the static load experiments. For specific log directory, check DataDir variable in the master_deploy* scripts.