FaaSFlow

Introduction

FaaSFlow is a serverless workflow engine that enables efficient workflow execution in 2 ways: a worker-side workflow schedule pattern to reduce scheduling overhead, and an adaptive storage library to use local memory to transfer data between functions on the same node.

Hardware Depedencies and Private IP Address

In our experiment setup, we use aliyun ecs instance installed with Ubuntu 18.04 (ecs.g7.2xlarge, cores: 8, DRAM: 32GB) for each worker node, and a ecs.g6e.4xlarge(cores: 16, DRAM: 64GB) instance for database node installed with Ubuntu 18.04 and CouchDB.
Please save the private IP address of the storage node as the <master_ip>, and save the private IP address of the other 7 worker nodes as the <worker_ip>.

Installation and Software Dependencies

Clone our code https://github.com/lzjzx1122/FaaSFlow.git and:

Reset worker_address configuration with your <worker_ip>:8000 on src/grouping/node_info.yaml. It will specify your workers' addresses. The scale_limit: 120 represents the maximum container numbers that can be deployed in each 32GB memory instance, and it does not need any change by default.
Reset COUCHDB_URL as http://openwhisk:openwhisk@<master_ip>:5984/ in src/container/config.py, src/workflow_manager/config.py, test/asplos/config.py. It will specify the corresponding database storage you built previously.
Then, clone the modified code into each node (8 nodes total).
On the storage node: Run scripts/db_setup.bash. It installs docker, CouchDB, some python packages, and build grouping results from 8 benchmarks.
On each worker node: Run scripts/worker_setup.bash. This install docker, Redis, some python packages, and build docker images from 8 benchmarks.

WorkerSP Start-up

The following operations help to run scripts under WorkerSP. Firstly, enter src/workflow_manager, change the configuration by DATA_MODE = optimized and CONTROL_MODE = WorkerSP in both 7 worker nodes and storage node. Then, start the engine proxy with the local <worker_ip> on each worker node by the following command:

    python3 proxy.py <worker_ip> 8000             (proxy start)

Enter test/asplos/config.py and define the GATEWAY_ADDR as <master_ip>:7000. Then start the gateway on the storage node by the following command:

    python3 gateway.py <master_ip> 7000           (gateway start)

If you would like to run scripts under WorkerSP, you have finished all the operations and are allowed to send invocations by run.py scripts for all WorkerSP-based performance tests. Detailed scripts usage is introduced in Run Experiment.

Note: We recommend restarting the proxy.py on each worker node and the gateway.py on the master node whenever you start the run.py script, to avoid any potential bug.

MasterSP Start-up

The following operations help to run scripts under MasterSP. Firstly, enter src/workflow_manager, change the configuration by DATA_MODE = raw and CONTROL_MODE = MasterSP in both 7 worker nodes and storage node. Then, restart the engine proxy on each worker node by the proxy start command, and restart the gateway on the storage node by the gateway start command.

Enter src/workflow_manager/config.py, and define the MASTER_HOST as <master_ip>:8000. Then, start another proxy on the storage node as the virtual master node by the following command:

    python3 proxy.py <master_ip> 8000

If you would like to run scripts under MasterSP, you have finished all the operations and allowed to send invocations by run.py scripts for all Master-based performance test. Detailed scripts usage is introduced in Run Experiment.

Run Experiment

We provide some test scripts under test/asplos. Note: We recommend to restart all proxy.py and gateway.py processes whenever you start the run.py script, to avoid any potential bug. The restart will clear all background function containers and reclaim the memory space.

Scheduler Scalability: the overhead of graph scheduler when scale-up total nodes of one workflow

Directly run on the storage node:

    python3 run.py

Component Overhead: overhead of one workflow engine

Start a proxy on any worker node (skip if you have already done in the above start-up) and get its pid. Then run it on any worker node:

    python3 run.py --pid=<pid>

Data Overhead: total time spend on data transmission

Make the WorkerSP deployment, run it on the storage node:

    python3 run.py --datamode=optimized

Then make the MasterSP deployment, run it again with --datamode=raw.

End-to-End Latency: run one-by-one and run all-at-once

Firstly, Make the WorkerSP deployment, run it on the storage node:

    python3 run.py --datamode=optimized --mode=single

Then terminate and restart all proxy.py and gateway.py (reasons in here), run it again with --datamode=optimized --mode=corun.

Secondly, make the MasterSP deployment, run it on the storage node:

    python3 run.py --datamode=raw --mode=single

Then terminate and restart all proxy.py and gateway.py , run it again with --datamode=raw --mode=corun.

Schedule Overhead: time spend on scheduling tasks