README of FarReach

README of DistReach
Note: please refer to ae_instructions.md for getting started instructions and detailed instructions.
If you have any question, please contact us if available (sysheng21@cse.cuhk.edu.hk).

Overview
1. Methods
2. Our Testbed
System Preparation
Data Preparation
1. Loading Phase
2. Workload Analysis & Dump Keys
Running Experiments (Automatic Evaluation)
1. Normal Script Usage
2. Perform Single Iteration
Running Workloads (Manual Evaluation)
1. Dynamic Workload (No Server Rotation)
2. Static Workload (Server Rotation)
Aggregate Statistics (Manual Evaluation)
1. Scripts
2. Usage and Example
Appendix
1. Bottleneck Index for Server Rotation
2. Other Notes

0 Overview

0.1 Methods

FarReach: our in-switch write-back caching
NoCache: a baseline without in-switch caching
NetCache: a baseline with in-switch write-through caching

0.2 Our Testbed

You can follow our settings below to build your own testbed

Machine requirements
- Four physical machines with Ubuntu 16.04/18.04
  - One main client (hostname: dl11)
  - One secondary client (hostname: dl20)
  - Two servers (hostnames: dl21 and dl30)
    - Note: controller is co-located in the first server (dl21)
- One 2-pipeline Tofino switch with SDE 8.9.1 (hostname: bf3)
  - Note: larger SDE version (e.g., 9.0.1) cannot support P4_14 correctly due to compiler its own bugs

Network configuration and topology
- All clients/servers/switch are in the same local area network (IP mask: 172.16.255.255; NOT bypass {switch} data plane)
  - Main client: 172.16.112.11
  - Secondary client: 172.16.112.20
  - First server (co-located with controller): 172.16.112.21
  - Second server: 172.16.112.30
  - Tofino switch OS: 172.16.112.19
- Testbed topology of programmable-switch-based network (bypass {switch} data plane)
  - Main client (NIC: enp129s0f0; MAC: 3c:fd:fe:bb:ca:78) <-> Tofino switch (front panel port: 5/0)
  - Secondary client (NIC: ens2f0; MAC: 9c:69:b4:60:34:f4) <-> Tofino switch (front panel port: 21/0)
  - First server (NIC: ens2f1; MAC: 9c:69:b4:60:34:e1) <-> Tofino switch (front panel port: 6/0)
  - Second server (NIC: ens3f1; MAC: 9c:69:b4:60:ef:c1) <-> Tofino switch (front panel port: 3/0)
  - Tofino switch (front panel port: 7/0) <-> Tofino switch (front panel port: 12/0) (for in-switch cross-pipeline recirculation)

1 System Preparation

Note: system preparation has already been done in our AE testbed, so AEC members do NOT need to re-execute the following steps.

1.1 Dependency Installation

Install python libraries for python 2.7.12 in {main client} if not
- pip install -r requirements.txt

Install libboost 1.81.0 in {first server} and {second server} if not

Under project directory (e.g., /home/ssy/projects/farreach-public)

 wget https://boostorg.jfrog.io/artifactory/main/release/1.81.0/source/boost_1_81_0.tar.gz
 tar -xzvf boost_1_81_0.tar.gz
 cd boost_1_81_0; ./bootstrap.sh --with-libraries=system,thread --prefix=./install && sudo ./b2 install

Install Maven 3.3.9 in {main client} and {secondary client} if not
Install Java OpenJDK-8 in {main client} and {secondary client} if not
- Configure JAVA_HOME: add sth like export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ (based on your own JAVA path) to ~/.bashrc
Install Tofino SDE 8.9.1 in {switch} if not
- Note: larger version (e.g., 9.0.1) cannot support P4_14 correctly due to compiler its own bugs
Install gcc-7.5.0 and g++-7.5.0 in two clients and two servers if not (NO need for {switch})
Use RocksDB 6.22.1 (already embedded in the project; NO need for extra installation)

1.2 Configuration Settings

Update the following configuration settings in scripts/global.sh based on your own testbed
- USER: Linux username (e.g., ssy)
- SWITCH\_PRIVATEKEY: the passwd-free ssh key used by {main client} to connect with {switch} as root user (e.g., ~/.ssh/switch-private-key)
  - See detailed steps in "Update SSH configuration settings" at the end of this Section 1.2
- CONNECTION\_PRIVATEKEY: the passwd-free ssh key used by the following connections as non-root user (e.g., ~/.ssh/connection-private-key)
  - By {main client} to connect with two servers and {first server} to connect with {second server} (for loading_and_backup)
  - By {switch} to connect with two servers (for maxseq and in-switch snapshot)
  - By two servers to connect with two clients (for client-side backups)
  - By {second server} to connect with {first server} (co-located with controller) (for in-switch snapshot)
  - Note: see detailed steps in "Update SSH configuration settings" at the end of this Section 1.2
- MAIN\_CLIENT: hostname of {main client} (e.g., dl11)
- SECONDARY\_CLIENT: hostname of {secondary client} (e.g., dl20)
- SERVER0: hostname of {first server} co-located with controller (e.g., dl21)
- SERVER1: hostname of {second server} (e.g., dl30)
- LEAFSWITCH: hostname of the Tofino {switch} (e.g., bf3)
- CLIENT/SWITCH/SERVER\_ROOTPATH: project directory path in clients/switch/servers (e.g., /home/ssy/projects/farreach-public)
- EVALUATION\_OUTPUT\_PREFIX: path to store the raw statistics of each experiment for aggregating analysis
- Network settings
  - Main client
    - MAIN_CLIENT_TOSWITCH_IP: the IP address of NIC for {main client} connecting to {switch}
    - MAIN_CLIENT_TOSWITCH_MAC: the MAC address of NIC for {main client} connecting to {switch}
    - MAIN_CLIENT_TOSWITCH_FPPORT: the front panel port in {switch} data plane corresponding to {main client}
    - MAIN_CLIENT_TOSWITCH_PIPEIDX: the pipeline index (0 or 1) for MAIN_CLIENT_TOSWITCH_FPPORT
    - MAIN_CLIENT_LOCAL_IP: the local IP address of NIC for {main client} connecting to the local area network (NOT bypass {switch} data plane)
  - Secondary client
    - SECONDARY_CLIENT_TOSWITCH_IP: the IP address of NIC for {secondary client} connecting to {switch}
    - SECONDARY_CLIENT_TOSWITCH_MAC: the MAC address of NIC for {secondary client} connecting to {switch}
    - SECONDARY_CLIENT_TOSWITCH_FPPORT: the front panel port in {switch} data plane corresponding to {secondary client}
    - SECONDARY_CLIENT_TOSWITCH_PIPEIDX: the pipeline index (0 or 1) for SECONDARY_CLIENT_TOSWITCH_FPPORT
    - SECONDARY_CLIENT_LOCAL_IP: the local IP address of NIC for {secondary client} connecting to the local area network (NOT bypass {switch} data plane)
  - First server
    - SERVER0_TOSWITCH_IP: the IP address of NIC for {first server} connecting to {switch}
    - SERVER0_TOSWITCH_MAC: the MAC address of NIC for {first server} connecting to {switch}
    - SERVER0_TOSWITCH_FPPORT: the front panel port in {switch} data plane corresponding to {first server}
    - SERVER0_TOSWITCH_PIPEIDX: the pipeline index (0 or 1) for SERVER0_TOSWITCH_FPPORT
    - SERVER0_LOCAL_IP: the local IP address of NIC for {first server} connecting to the local area network (NOT bypass {switch} data plane)
  - Secondary server
    - SERVER1_TOSWITCH_IP: the IP address of NIC for {second server} connecting to {switch}
    - SERVER1_TOSWITCH_MAC: the MAC address of NIC for {second server} connecting to {switch}
    - SERVER1_TOSWITCH_FPPORT: the front panel port in {switch} data plane corresponding to {second server}
    - SERVER1_TOSWITCH_PIPEIDX: the pipeline index (0 or 1) for SERVER1_TOSWITCH_FPPORT
    - SERVER1_LOCAL_IP: the local IP address of NIC for {second server} connecting to the local area network (NOT bypass {switch} data plane)
  - Controller (co-located with first server)
    - CONTROLLER_LOCAL_IP: the IP address of NIC for {controller} connecting to {switch} (the same as SERVER0_TOSWITCH_IP in our testbed)
  - Switch
    - SWITCHOS_LOCAL_IP: the IP address of NIC for {switch} OS connecting to the local area network (NOT bypass {switch} data plane)
    - SWITCH_RECIRPORT_PIPELINE1TO0: the front panel port in {switch} data plane for pipeline 1 to connect with pipeline 0 for in-switch recirculation
    - SWITCH_RECIRPORT_PIPELINE0TO1: the front panel port in {switch} data plane for pipeline 0 to connect with pipeline 1 for in-switch recirculation
- CPU settings
  - First server
    - SERVER0_WORKER_CORENUM: the number of CPU cores specifically used for processing requests in {first server} (e.g. 16)
    - SERVER0_TOTAL_CORENUM: the total number of CPU cores in {first server} (MUST larger than SERVER0_WORKER_CORENUM; e.g., 48)
  - Second server
    - SERVER1_WORKER_CORENUM: the number of CPU cores specifically used for processing requests in {second server} (e.g., 16)
    - SERVER1_TOTAL_CORENUM: the total number of CPU cores in {second server} (MUST larger than SERVER1_TOTAL_CORENUM; e.g., 48)

Run bash scripts/local/update_config_files.sh to update ini configuration files based on the above network and CPU settings in scripts/global.sh

Update ssh configuration settings
- In any of above machines (2 clients, 2 servers, and 1 switch), if you need to manually type yes/no to check the hostname when using ssh command to connect other machines, add the following content to ~/.ssh/config under the machine:
```
 Host *
 	StrictHostKeyChecking no
```
- Generate SWITCH_PRIVATEKEY
  - Under {main client}, if the ssh key has not been created, run sudo ssh-keygen -t rsa -f /home/{USER}/.ssh/switch-private-key (use empty passwd for no passphrase)
    - Also run sudo chown {USER}:{USER} /home/{USER}/.ssh/switch-private-key (use your Linux username) to change the owner
  - Append the content of ~/.ssh/switch-private-key.pub of {main client}, into /root/.ssh/authorized_keys of {switch}
- Generate CONNECTION_PRIVATEKEY
  - Consider the following source-connectwith-destination pairs:
    - {main client}-connectwith-{first server} and {main client}-connectwith-{second server}
    - {first server}-connectwith-{second server}
    - {switch}-connectwith-{first server} and {switch}-connectwith-{second server}
    - {first server}-connectwith-{main client}, {first server}-connectwith-{secondary client}, {second server}-connectwith-{main client}, and {second server}-to-{secondary client}
    - {second server}-connectwith-{first server} (controller is co-located with {first server})
  - For each pair of source connecting with destination
    - Under {source}, if the ssh key has not been created, run ssh-keygen -t rsa -f ./connection-private-key; mv ./connection-private-key* ~/.ssh (use empty passwd for no passphrase)
    - Append the content of ~/.ssh/connection-private-key.pub of {source}, into ~/.ssh/authorized_keys of {destination}
- Double check ssh connectivity under Tofino {switch}
  - Try ssh -i ~/.ssh/connection-private-key {first server} and ssh -i ~/.ssh/connection-private-key {second server}
  - If you encouter an error of Failed to add the host to the list of known hosts (/home/{USER}/.ssh/known_hosts), it means that {USER} does NOT have permission to access ~/.ssh
    - Run su to enter root mode
    - Run chown -R {USER}:{USER} /home/{USER}/.ssh

1.3 Code Compilation

Sync and compile RocksDB (TIME: around 3 hours; ONLY need to perform once)
- Under {main client}, run bash scripts/remote/sync_kvs.sh to sync rocksdb-6.22.1/ to {first server} and {second server}
- Under {first server} and {second server}, compile RocksDB
  - Run sudo apt-get install libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev libjemalloc-dev libsnappy-dev to install necessary dependencies for RocksDB
  - cd rocksdb-6.22.1
  - PORTABLE=1 make static_lib
    - We already comment -Wstrict-prototype and -Werror in RocksDB's Makefile to fix compilation errors due to strict-alias warning
    - We use PORTABLE=1 to fix runtime error of illegal instruction when open()
  - For each method (farreach or nocache or netcache)
    - Run mkdir /tmp/{method} to prepare the directory (e.g., /tmp/farreach) for database in advance

Compile source code for all methods
- Under {main client}: bash scripts/remote/firstcompile.sh to compile software code for all methods in clients, servers, and switch(TIME: around 1 hour).
- For each {method}, under {switch}
  - Run su to enter root mode
  - Run cd {method}/tofino; bash compile.sh to make P4 code (TIME: around 3 hours)
    - If you have compiled P4 code of {method} before, you do NOT need to re-compile it again
    - If you really want to re-compile it (maybe due to P4 code modification), you should delete the corresponding directory (netbufferv4, nocache, or netcache) under $SDE/pkgsrc/p4-build/tofino/ before re-compilation

1.4 Testbed Building

Building your testbed based on network settings provided by scripts/global.sh
- Before running the following scripts, in each of clients and servers (NO need for Tofino switch)
  - Append the following content into /etc/security/limits.conf by sudo vim:
```
 {USER} hard nofile 1024000
 {USER} soft nofile 1024000
```
- In {main client}, run bash scripts/local/configure_client.sh 0
- In {secondary client}, run bash scripts/local/configure_client.sh 1
- In {first server}, run bash scripts/local/configure_server.sh 0
- In {second server}, run bash scripts/local/configure_server.sh 1
- In {switch} OS, run su to enter root mode and run bash scripts/local/configure_switchos.sh

2 Data Preparation

Note: data preparation has already been done in our AE testbed, so AEC members do NOT need to re-execute the following steps.

2.1 Loading Phase

Perform the loading phase and backup for evaluation time reduction (TIME: around 40 minutes; ONLY need to perform once)
- Under {main client}
  - Run bash scripts/remote/setmethod.sh nocache to set DIRNAME as nocache and sync to other machines.
  - Run bash scripts/remote/prepare_load.sh to copy recordload/config.ini to overwrite nocache/config.ini in all clients, servers, and switch
- Under {switch}, create two terminals
  - In the first terminal
    - cd nocache/tofino
    - Run su to enter root mode
    - Run bash start_switch.sh to launch nocache switch data plane, which will open a CLI
  - In the second terminal
    - cd nocache
    - Run su to enter root mode
    - Run bash localscripts/launchswitchostestbed.sh to launch nocache switch OS and other daemon processes
- Under {main client}
  - Run bash scripts/remote/load_and_backup.sh to launch servers and clients automatically for loading and backup
    - Note: scripts/remote/load_and_backup.sh will kill servers and clients at the end automatically
    - Note: scripts/remote/load_and_backup.sh will also resume the original nocache/config.ini in all clients, server, and switch after all steps
- Under {switch}
  - In the first terminal
    - Type exit and press enter to stop switch data plane
    - If CLI is still NOT closed, type Ctrl+C in the CLI to stop switch data plane
  - In the second terminal
    - cd nocache
    - Run su to enter root mode
    - Run bash localscripts/stopswitchtestbed.sh to stop switch OS and other daemon processes
- Under {main client}, run bash scripts/remote/stopall.sh to forcely kill all processses

2.2 Workload Analysis & Dump Keys

Analyze workloads to generate workload-related information before evaluation (ONLY need to perform once)
- Workload-related information includes:
  - Dump hot keys and per-client-thread pregenerated workloads (independent with server rotation scale (i.e., # of simulated servers))
  - Generate key poularity change rules for dynamic patterns (independent with server rotation scale)
  - Calculate bottleneck partitions for different server rotation scales (i.e., 16, 32, 64, and 128)
- For each {workload}
  - Options of {workload}
    - YCSB core workloads: workloada, workloadb, workloadc, workloadd, workloadf, workload-load
    - Synthetic workloads: synthetic (i.e., 100% writes, 0.99 skewness, 128B valuesize), synthetic-25, synthetic-75, skewness-90, skewness-95, uniform, valuesize-16, valuesize-32, valuesize-64
    - Note: NO need to analyze the following workloads due to duplication
      - synthetic-0 is the same as workloadc
      - synthetic-50 is the same as workloada
      - synthetic-100, skewness-99, and valuesize-128 are the same as synthetic
  - TIME cost of workload analysis
    - Each of most workloads needs around 5 minutes (including workloada, workloadb, workloadc, workloadd, workloadf, workload-load, synthetic, valuesize-16, valuesize-32, and valuesize-64)
    - Each of other workloads needs 20 minutes to 40 minutes (including skewness-90 of 20m, skewness-95 of 40m, and uniform of 30m)
      - The reason is that YCSB uses Zipfian key generator with small skewness and uniform key generator for these workloads, which incurs large computation overhead
      - Although workload-load is also uniform distribution, YCSB uses counter generator which has small computation overhead
  - Under {main client}
    - Update workload_name as {workload} in keydump/config.ini
      - Note: you do NOT need to change configurations related with server rotation scales (i.e., server_total_logical_num and server_total_logical_num_for_rotation) in keydump/config.ini, which is NOT used here
    - Run bash scripts/remote/keydump_and_sync.sh to dump workload-related information (e.g., hot keys and bottleneck serveridx), and sync them to all clients, servers, and switch

3 Running Experiments (Automatic Evaluation)

Note: you can refer to AE instructions for getting started instructions and detailsed instructions.
Note: if you want to add any new experiment script by youself in scripts/exps/, the script file name should NOT include the reserved strings (ycsb, server, controller, reflector, and server_rotation); otherwise, the new experiment script may be killed by itself during evaluation.

3.1 Normal Script Usage

To reproduce experiments in our evaluation, we provide the following scripts under scripts/exps/
- We use server rotation to cope with limited machines, which requires relatively long time
  - Each iteration fixes the bottleneck partition (and deploys a non-bottleneck partition) (TIME: around 5 minutes)
  - Each server rotation comprises tens of iterations to simulate multiple machines (TIME: around 1-8 hour(s))
  - Each experiment needs multiple server rotations for different methods and parameter settings (TIME: around 1-2 day(s))
  - Each round includes multiple experiments to evaluate from different perspectives (TIME: around 1 week)
  - We need multiple rounds to reduce runtime variation (TIME: around 1-2 month(s))
  - Note: as the time for evaluation is relatively long, you may want to run scripts in the background
    - Make sure that you use screen or nohup to run each script in the background, otherwise the script will be killed by OS after you log out from the ssh connection
  - Note: before and after each experiment, run scripts/remote/stopall.sh to forcely kill all processes to avoid confliction

Note: if you encouter many timeouts when issuing in main client, your servers may fail to be launched (e.g., due to code compilation with incorrect serve rotation enabling/disabling) or may be launched with a large delay (e.g., due to limited server-side power in your testbed to load database files of 100M records pre-loaded before), therefore the client-issued requests are not answered and hence have timeouts.
- You can increase the sleep time after launching servers yet before launching clients in scripts/remote/test_dynamic.sh (now is 240s), scripts/remote/test_server_rotation.sh (now is 60s), scripts/remote/test_server_rotation_p1.sh (now is 120s), and scripts/remote/test_server_rotation_p2.sh (now is 120s).

Note: if you encouter any other problem, you can keep the critical information and contact us if available (sysheng21@cse.cuhk.edu.hk) for help
- The causes of problem may be testbed mis-configuration, script mis-usage, resource confliction (e.g., Tofino switch data plane cannot support multiple P4 programs simultaneously), Tofino hardware bugs, and our code bugs
- The critical information should include: command history of terminal, dumped information of scripts, log files generated by scripts(e.g., {method}/tmp\_\*.out in servers and switch, benchmark/ycsb/tmp\_\*.out in clients, and {method}/tofino/tmp\_\*.out in switch), and raw statistics generated by YCSB clients (e.g., benchmark/ycsb/{method}-statistics/)

Scripts for different experiments

Exp #	Scripts	Description
1	run_exp_throughput.sh	Throughput analysis for different YCSB core workloads under static workload pattern (with server rotation)
2	run_exp_latency.sh	Latency analysis for different target throughputs under static workload pattern (with server rotation)
3	run_exp_scalability.sh	Scalability for different # of simulated servers under static workload pattern (with server rotation)
4	run_exp_write_ratio.sh	Synthetic workloads with different write ratios under static workload pattern (with server rotation)
5	run_exp_key_distribution.sh	Synthetic workloads with different key distributions under static workload pattern (with server rotation)
6	run_exp_value_size.sh	Synthetic workloads with different value sizes under static workload pattern (with server rotation)
7	run_exp_dynamic.sh	Synthetic workloads with different dynamic workload patterns (NO server rotation)
8	run_exp_snapshot.sh	Performance and control-plane bandwidth overhead of snapshot generation under different dynamic workload patterns (NO server rotation)
9	run_exp_recovery.sh	Crash recovery time under static workload pattern (with server rotation)
10	N/A	See hardware reousrce usage of {method} in corresponding directory in {switch} (e.g., `$SDE/pkgsrc/p4-build/tofino/nocache/visualization`, `$SDE/pkgsrc/p4-build/tofino/netcache/visualization`, and `$SDE/pkgsrc/p4-build/tofino/netbufferv4/visualization`, where netbufferv4 corresponds to farreach)

Other useful scripts
- scripts/remote/stopall.sh: forcely to stop and kill ycsb clients, server rotation scripts, dynamic scripts, servers (including controller and simulated reflector), and switch (including data plane and switch OS).
  - Note: you can run this script to kill all involved processes, if the previous experiment fails (e.g., due to testbed mis-configuration)
- scripts/remote/enable_server_rotation.sh: update common/helper.h to enable server rotation and re-compile software code of all methods
- scripts/remote/disable_server_rotation.sh: update common/helper.h to disable server rotation and re-compile software code of all methods
- scripts/results/*.sh: parse raw output file of run_exp_*.sh to get results

With server rotation: run each {experiment} except exp_dynamic and exp_snapshot
- Options of {experiment}: exp_throughput, exp_latency, exp_scalability, exp_write_ratio, exp_key_distribution, exp_value_size, and exp_recovery
- Under {main client}
  - Re-compile all methods to enable server rotation if NOT: bash scripts/remote/enable_server_rotation.sh
  - Run nohup bash scripts/exps/run_{experiment}.sh <roundnumber> >tmp_{experiment}.out 2>&1 &
  - Note: we run each experiment for multiple rounds to eliminate the effect of runtime variation (e.g., RocksDB fluctuation), so we need to specify to indicate the index of the current round for running an experiment
- After experiment, under {main client}
  - bash scripts/remote/stopall.sh to kill all involved processes
  - bash scripts/results/parse_{experiment}.sh tmp_{experiment}.out to get results

Without server rotation: run {experiment} of exp_dynamic or exp_snapshot
- Under {main client}
  - Re-compile code to disable server rotation if NOT: bash scripts/remote/disable_server_rotation.sh
  - Run bash scripts/exps/run_{experiment} <roundnumber>
  - Note: we run each experiment for multiple rounds to eliminate the effect of runtime variation (e.g., RocksDB fluctuation), so we need to specify to indicate the index of the current round
- After experiment, under {main client}
  - bash scripts/remote/stopall.sh to kill all involved processes
  - bash scripts/results/parse_{experiment}.sh tmp_{experiment}.out to get results
- As most experiments use server rotation for static pattern instead of dynamic patterns, you may want to re-compile your code to enable server rotation again
  - Under {main client}, enable server rotation: bash scripts/remote/enable_server_rotation.sh

Notes for exp_recovery
- Possible errors for scp in farreach/localscripts/fetch\*.sh (maybe due to testbed mis-configurations)
  - If you have an error of hot key verification failed, check whether {switch} can connect with two servers, two servers can connect with two clients, and {second server} can connect with {first server}, by CONNECTION_PRIVATEKEY
  - If you have an error of permission denied when transfering files, check the correctness of ownership for /tmp/farreach in {switch} and two servers
  - If you have an error of permission denied (public key), check whether you spefcify the correct CONNECTION_PRIVATEKEY in {switch} and two servers
- If you want to test recovery time based on previous raw statistics of exp_recovery instead of running server-rotation-based experiment for exp_recovery again
  - Step 1: check scripts/global.sh under {main client}
    - Make sure EVALUATION_OUTPUT_PREFIX points to the path of previous raw statistics of exp-recovery (including in-switch snapshot, client-side backups, and maxseq) generated by previous server-rotation-based experiments
  - Step 2: under {main client}
    - Set exp9_recoveryonly as 1 in scriptsexps/run_exp_recovery.sh
    - Run scripts/exps/run_exp_recovery.sh, which will skip the step of running a new server-rotation-based experiment for exp_recovery
- Note: crash recovery time is strongly related with network settings in each specific testbed. which may have differences in units of 0.1 seconds

3.2 Perform Single Iteration

During one server rotation composed of tens of iterations, some iterations may fail due to database performance fluctuation or unstable testbed
- To fix this issue, we provide a script to run a single iteration (TIME: around 5 minutes) for each failed iteration instead of re-running all iterations of the server rotation (TIME: around 1-8 hour(s)), which is time-consuming

If scripts (e.g., scripts/local/calculate_statistics.sh) say that you need to perform a single iteration for each missing iteration number of an incomplete server rotation of the experiment
- Under {main client}, run bash scripts/exps/run_makeup_rotation_exp.sh <expname> <roundnumber> <methodname> <workloadname> <serverscale> <bottleneckidx> <targetrotation> [targetthpt] to launch a single iteration
  - expname: experiment name (eg.: "exp1" for throughput analysis)
    - Note: expname only indicates the path to store rawstatistics, yet NOT affect experiment results
  - roundnumber: the index of the current round for running the experiment (eg.: 0)
  - methodname: experiment method (eg.: farreach)
  - workloadname: workload name (eg.: workloada)
  - serverscale: number of simulated servers (eg.: 16)
  - bottleneckidx: bottleneck server index of server rotation related with the workload and scale (eg.: 14)
  - targetrotation: the non-bottleneck server index in the missing iteration (eg.: 10)
  - targetthpt: the target throughput of the server rotation for the missing iteration, only applicable for exp2 (latency analysis)
  - The above arguments of scripts/exps/run_makeup_rotation_exp.sh are determined by the missing iteration of the server rotation for the specific experiment
    - For example, for exp_throughput, you may pass arguments with expname=exp1, roundnumber=0, methodname=farreach, workloadname=workloada, serverscale=16, bottleneckidx=14, targetrotation=10 to execute the 11th iteration
      - The script will deploy the bottleneck serveridx 14 in {first server} and the non-bottleneck serveridx 10 in {second server} for the single iteration, and update the raw statistics in place
- Note: scripts/exps/run_makeup_rotation_exp.sh should NOT support exp_dynamic or exp_snapshot, as the experiments of dynamic workload patterns do NOT use server rotation
  - Therefore, this script ONLY works for experiments with server rotation: exp_key_distribution, exp_latency, exp_scalability, exp_value_size, exp_write_ratio, and exp_throughput
- Note: scripts/exps/run_makeup_rotation_exp.sh now does NOT support the failure of the first iteration (i.e., ONLY the bottleneck partition is deployed in {first server})
  - You may refer to Section 4.2 (especially for Step 4) to use scripts/remote/test_server_rotation_p1.sh
  - Make sure scripts/global.sh, scripts/common.sh, and ${method}/config.ini are correctly configured before running scripts/remote/test_server_rotation_p1.sh
  - (TODO) We may update scripts/exps/run_makeup_rotation_exp.sh to support the first iteration in the future

4 Running Workloads (Manual Evaluation)

4.1 Dynamic Workload (No Server Rotation)

Decide {workload} (e.g., workloada), {method} (e.g., farreach), and {dynamic pattern} (e.g., hotin) to use
- Options of {dynamic pattern}: hotin, hotout, and random
- Note: scripts/exps/run_exp_dynamic.sh or scripts/exps/run_exp_snapshot.sh include all the following steps (except step 2, as the two scripts assume that you have already disabled server rotation in advance)

Step 1: prepare ini config files (under {main client})
- Update settings in the config file {method}/config.ini (e.g., farreach/config.ini):
  - Set global::workload_mode=1
  - Set global::workload_name={workload}
  - Set global::dynamic_ruleprefix={dynamic pattern}
  - Set global::server_physical_num=2
  - Set global::server_total_logical_num=2
  - Set server0::server_logical_idxes=0
  - Set server1::server_logical_idxes=1
- Set DIRNAME as {method} in scripts/common.sh
- Double-check the global testbed settings in scripts/global.sh based on your testbed

Step 2: if server rotation is enabled (default setting), re-compile code to disable server rotation
- Under {main client}, disable server rotation
  - Comment line 82 (#define SERVER_ROTATION) in common/helper.h to diable server rotation
  - Run bash scripts/remote/sync_file.sh common helper.h to sync code changes to all machines
- Under {main client}, for the current {method}, re-compile software code (NO need for P4 code)
  - Set DIRNAME as {method} in scripts/common.sh
  - Run bash scripts/remote/sync_file.sh scripts common.sh
  - Under {main client} and {secondary client}, run bash scripts/local/makeclient.sh
  - Under {first server} and {second server}, run bash scripts/local/makeserver.sh
  - Under {switch}, run bash scripts/local/makeswitchos.sh

Step 3: launch switch data plane and switch OS
- Create two terminals in {switch}
- Launch switch data plane in the first terminal
  - cd {method}/tofino
  - su
  - Run bash start_switch.sh, which will open a CLI
- Launch switch OS and other daemon processes (for cache management and snapshot generation) in the second terminal
  - cd {method}
  - su
  - bash localscripts/launchswitchostestbed.sh
- Note: if you encounter any problem, you can check the log files of {method}/tmp_\*.out and {method}/tofino/tmp_\*.out in {switch}

Step 4: launch servers and clients without server rotation
- Under {main client}: bash scripts/remote/test_dynamic.sh
- Note: if you encouter any problem
  - You can check the output of {main client}
  - You can check the log files of benchmark/ycsb/tmp_\*.out in {secondary client}
  - You can check the log files of {method}/tmp_\*.out in {first server} and {second server}

Step 5: cleanup testbed
- Under {switch}
  - Stop switch data plane in the CLI of the first terminal
    - Type exit and press enter
    - If CLI is not closed, type Ctrl+C
  - Stop switch OS and other daemon processes in the second terminal
    - cd {method}
    - su
    - bash localscripts/stopswitchtestbed.sh
- Under {main client}
  - Run bash scripts/remote/stopservertestbed.sh to stop servers
  - Run bash scripts/remote/stopall.sh to forcely kill all processses

Step 6: aggregate statistics
- Under {main client}, run bash scripts/remote/calculate_statistics.sh

Step 7: if you do NOT run dynamic workload patterns, you should re-compile code to enable server rotation
- Under {main client}, enable server rotation
  - Uncomment line 82 (#define SERVER_ROTATION) in common/helper.h to enable server rotation
  - bash scripts/remote/sync_file.sh common helper.h to sync code changes to all machines
- Under {main client}, for the current {method}, re-compile software code (NO need for P4 code)
  - Set DIRNAME as {method} in scripts/common.sh
  - Run bash scripts/remote/sync_file.sh scripts common.sh
  - Under {main client} and {secondary client}, run bash scripts/local/makeclient.sh
  - Under {first server} and {second server}, run bash scripts/local/makeserver.sh
  - Under {switch}, run bash scripts/local/makeswitchos.sh

4.2 Static Workload (Server Rotation)

Decide {workload} (e.g., workloada) and {method} (e.g., farreach) to use
- Note: we assmue that you have analyzed {workload} to get {bottleneck serveridx} for your {server rotation scale}
  - If not, please refer to Section 2.2 for workload analysis
  - As bottleneck server index is stable for a given <{workload}, {server rotation scale}>, you can also directly refer to the appendix table in Section 6.1
- Note: we assume that you have already compiled code with server rotation enabled
  - If not, please refer to step 7 in Section 4.1 to re-compile code for enabling server rotation
- Note: the scripts in scripts/exps/ (except run_exp_dynamic.sh and run_exp_snapshot.sh) include all the following steps

Step 1: prepare ini config files (under {main client})
- Update settings in the config file {method}/config.ini (e.g., farreach/config.ini):
  - Set workload_name={workload}
  - Set workload_mode=0
  - Set bottleneck_serveridx_for_rotation={bottleneck serveridx}
  - Set server_total_logical_num_for_rotation={server rotation scale}
  - Note: {method}/config.ini must have the correct {bottleneck serveridx} and {server rotation scale}
    - Otherwise, client-side PregeneratedWorkload will issue the requests of incorrect partitions (corresponding to non-running servers) and hence timeout
- Set DIRNAME as {method} in scripts/common.sh
- Double-check the global testbed settings in scripts/global.sh based on your testbed

Step 2: prepare for launching switch data plane and switch OS
- Under {main client}, run bash scripts/remote/prepare_server_rotation.sh
  - This script can generate and sync a new {method}/config.ini based on the existing {method}/config.ini with the configurations you set in step 1
    - The main changes in the new {method}/config.ini is that it sets server0::server_logical_idxes as ${bottleneck serveridx} (e.g., 14), and sets server1::server_logical_idxes as all other serveridxes except ${bottleneck serveridx} (e.g., 0:1:2:3:4:5:6:7:8:9:10:11:12:13:15)
  - The goal is that {switch} can use the new ${method}/config.ini to configure packet forwarding rules, such that we do NOT need to re-launch switch during server rotation

Step 3: launch switch data plane and switch OS
- Create two terminals in {switch}
- Launch switch data plane in the first terminal
  - cd {method}/tofino
  - su
  - Run bash start_switch.sh, which will open a CLI
- Launch switch OS and other daemon processes (for cache management and snapshot generation) in the second terminal
  - cd {method}
  - su
  - bash localscripts/launchswitchostestbed.sh
- Note: if you encounter any problem, you can check the log files of {method}/tmp_\*.out and {method}/tofino/tmp_*.out in {switch}

Step 4: launch servers and clients with server rotation
- Under {main client}, run bash scripts/remote/test_server_rotation.sh
  - Phase 1: test_server_rotation.sh invokes scripts/remote/test_server_rotation_p1.sh to run the first iteration (the bottleneck partition is deployed into {first server})
  - Phase 2: test_server_rotation.sh invokes scripts/remote/test_server_rotation_p2.sh to run the ith iteration (the bottleneck partition is deployed into {first server}, and the ith non-bottleneck partition is deployed into {second server}), where 1 <= i <= {server rotation scale}-1
- Under {main client}, perform a single iteration for each failed iteration if any
  - If strid=server-x is missed, run bash scripts/remote/test_server_rotation_p1.sh 1
  - If strid=server-x-y is missed, run: bash scripts/remote/test_server_rotation_p2.sh 1 y

Step 5: cleanup testbed
- Under {switch}
  - Stop switch data plane in the CLI of the first terminal
    - Type exit and press enter
    - If CLI is not closed, type Ctrl+C
  - Stop switch OS and other daemon processes in the second terminal
    - cd {method}
    - su
    - bash localscripts/stopswitchtestbed.sh
- Under {main client}
  - Run bash scripts/remote/stopservertestbed.sh to stop servers
  - Run bash scripts/remote/stopall.sh to forcely kill all processses

Step 6: aggregate statistics
- Under {main client}, run bash scripts/remote/calculate_statistics.sh

5. Aggregate Statistics (Manual Evaluation)

5.1 Scripts

We provide the following scripts to help aggregate statistics
- calculate_statistics.sh: calculate throughput and latency with or without server rotation
- calculate_bwcost.sh: calculate control-plane bandwidth cost
- calculate_recovery_time.sh: calculate crash recovery time
Notes
- If you use automatic way in Section 3 for evaluation
  - As scripts/exps/run_exp_\* scripts have aggregated the statistics automatically, you can redirect stdout of the script into a file and find aggregated results in the file
  - For example, after nohup bash scripts/exps/run_exp_throughput.sh >tmp.out 2>&1 &, you can find aggregated results in tmp.out
- If you use manual way in Section 4 for evaluation
  - You can follow Section 5.2 to run a script (e.g., calculate_statistics.sh) and get the corresponding gggregated statistics
  - The scripts will aggregate the statistics based on the settings in scripts/global.sh, scripts/common.sh, and {method}/config.ini

5.2 Usage and Example

Calculate throughput and latency with server rotation yet without target throughput

Under {main client}, run bash scripts/remote/calculate_statistics.sh 0
Supported experiments: exp1, exp3, exp4, exp5, and exp6
Output example:

 	...
 	[STATIC] average bottleneck totalthpt: 0.092875 MOPS; switchthpt: 0.0245 MOPS; serverthpt: 0.0675625 MOPS
 	[STATIC] aggregate throughput: 1.31126577666 MOPS; normalized throughput: 19.4081891088, imbalanceratio: 1.01388888889
 	[STATIC] average latency 286.901026111 us, medium latency 85 us, 90P latency 584 us, 95P latency 1717 us, 99P latency 2597 us
 	...

Calculate throughput and latency with server rotation and with target throughput

Under {main client}, run bash scripts/remote/calculate_statistics.sh 1
Supported experiment: exp2
Output example:

 	...
 	[STATIC] average bottleneck totalthpt: 0.0173125 MOPS; switchthpt: 0.00975 MOPS; serverthpt: 0.006875 MOPS
 	[STATIC] aggregate throughput: 0.190073026316 MOPS; normalized throughput: 27.6469856459, imbalanceratio: 1.0
 	[STATIC] average latency 94.8354988254 us, medium latency 57 us, 90P latency 90 us, 95P latency 123 us, 99P latency 1123 us
 	...

Calculate throughput and latency of dynamic workload

Under {main client}, run bash scripts/remote/calculate_statistics.sh 0
Supported experiments: exp7, exp8
Output example:

 	...
 	[DYNAMIC] per-second statistics:
 	thpt (MOPS): [0.178, 0.245, ... , 0.215]
 	normalized thpt: [3.2962962962962963, 3.310810810810811, ... , 3.2575757575757573]
 	imbalanceratio: [1.0, 1.0, ... , 1.0153846153846153]
 	avg latency (us): [497.21625182852614, 517.8129587343011, ... , 501.72249302450666]
 	medium latency (us): [120, 95, ... , 132]
 	90P latency (us): [1487, 1571, ... , 1579]
 	95P latency (us): [1535, 1610, ... , 1642]
 	99P latency (us): [1614, 1687, ... , 1729]
 	[DYNAMIC][OVERALL] avgthpt 0.228106666667 MOPS, avelat 0 us, medlat 0 us, 90Plat 0 us, 95Plat 0 us, 99Plat 0 us
 	...

Calculate control-plane bandwidth cost
- Under {main client}, run bash scripts/remote/calculate_bwcost.sh
- Supported experiment: exp8
- Output example:
```
 	perserver avglocalbwcost: [0.18816512500000002, 0.18981975, ... s, 0.19562]
 	average bwcost of entire control plane: 4.18830950595 MiB/s
```

Calculate crash recovery time

Under {main client}, run bash scripts/remote/calculate_recovery_time.sh <roundnumber>
Supported experiment: exp9
Output example:

 	Server collect time: 1.0 s
 	Server preprocess time: 0.016991 s
 	Server replay time: 0.0106864375 s
 	Server total recovery time: 1.0276774375 s
 	Switch collect time: 0.9605 s
 	Switch replay time: 0.338202 s
 	Switch total recovery time: 1.298702 s

6 Appendix

6.1 Bottleneck Index for Server Rotation

Workload Name	Scale	Bottleneck Serveridx
workload-load	16	13
workloada	16	14
workloadb	16	14
workloadc	16	14
workloadd	16	15
workloadf	16	14
synthetic	16	14
synthetic-*	16	14
valuesize-*	16	14
skewness-90	16	8
skewness-95	16	8
uniform	16	5
workloada	32	29
workloada	64	59
workloada	128	118

6.2 Other Notes

We haved changed parameters of some workload profiles in benchmark/ycsb/workloads/ for sysnthetic workloads
- For write ratio (e.g., 25%), change readproportion and updateproportion in workloads/synthetic as workloads/synthetic-XXX (e.g., workloads/synthetic-25)
- For value size (e.g., 32), change fieldlength in workloads/synthetic as workloads/valuesize-XXX (e.g., workloads/valuesize-32)
- For skewness (e.g., 0.9), change requestdistribution and zipfianconstant in workloads/synthetic as workloads/skewness-XXX (e.g., workloads/skewness-0.9) and workloads/uniform

Paths for raw statistics and aggregated results
- Under static pattern with server rotation
  - {main client} and {secondary client} should dump raw statistics into benchmark/output/{workloadname}-statistics/{method}-static{server rotation scale}-client{physicalidx}.out (e.g., benchmark/output/workloada-statistics/farreach-static16-client0.out)
- Under dynamic pattern without server rotation
  - {main client} and {secondary client} should dump raw statistics into benchmark/output/{workloadname}-statistics/{method}-{dynamic pattern}-client{physicalidx}.out (e.g., benchmark/output/synthetic-statistics/farreach-hotin-client0.out)
- If you use manual way as in Section 4 for evaluation
  - As the paths of raw statistics do NOT have other parameter information (e.g., write ratio or skewness), you need to aggregate the raw statistics before running the next experiment, which may overwrite them
    - You can refer to Section 5 to get aggregated results for the current experiment
    - You can also backup the raw statistics files for the current experiment if necessary, which will be overwritten next time
  - Note: the scripts of automatic way for evaluation in Section 3 will automatically aggregate and backup raw statistics, so you do NOT need to do it manually

Evaluation results in AE version may be larger
- Examples
  - Take hotin pattern in exp_dynamic as an example
    - In the evaluation version, FarReach can achieve around 0.23MOPS (nocache/netcache achieve 0.13MOPS) in Exp#7 and Exp#8
    - In the latest AE version, FarReach can achieve around 0.28MOPS (nocache/netcache achieve 0.16MOPS) in Exp#7 and Exp#8
  - Take static pattern in exp_recovery as an example
    - In the evaluation version, FarReach can achieve around 2.0MOPS for the synthetic workload
    - In the latest AE version, FarReach can achieve around 2.3MOPS for the synthetic workload
- Reasons
  - Main reason
    - In the evaluation version, each server in our testbed only has 24 CPU cores
    - However, in the latest AE version, due to the unavailability of our evaluation testbed, we use a new testbed where each server has 48 CPU cores
    - More computation resources in servers can improve the performance of key-value storage
  - Secondary reason
    - In the evaluation version, we kill servers immediately after storing 100M records into server-side KVS for loading phase
      - During evaluation, server-side KVS still have background compression operations (caused by writes of loading phase) and hence incur some write stalls to reduce performance
    - However, in the latest AE version, we kill servers 10 minutes after storing 100M records into server-side KVS for loading phase, such that all background compression operations caused by writes of loading phase will be completed
      - During evaluation, server-side KVS will NOT perform compression operations for writes of loading phase and hence NOT incur write stalls
- Summary: we emphasize that the results are still reasonable
  - First, the performance is also strongly related with the power of specific testbed, so it is reasonable to have larger numbers under a more powerful testbed
  - Second, both versions can achieve a fair comparison between FarReach and baselines, which does NOT affect our conclusions
    - Example of hotin pattern in exp_dynamic: the throughput of nocache/netcache is always around 57% of that of farreach in each version

adslabcuhk/distreach