/ghost-userspace

Primary LanguageCBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling

ghOSt is a general-purpose delegation of scheduling policy implemented on top of the Linux kernel. The ghOSt framework provides a rich API that receives scheduling decisions for processes from userspace and actuates them as transactions. Programmers can use any language or tools to develop policies, which can be upgraded without a machine reboot. ghOSt supports policies for a range of scheduling objectives, from µs-scale latency, to throughput, to energy efficiency, and beyond, and incurs low overheads for scheduling actions. Many policies are just a few hundred lines of code. Overall, ghOSt provides a performant framework for delegation of thread scheduling policy to userspace processes that enables policy optimization, non-disruptive upgrades, and fault isolation.

SOSP '21 Paper
SOSP '21 Talk

The ghOSt kernel is here. You must compile and run the userspace component on the ghOSt kernel.

This is not an officially supported Google product.


Compilation

The ghOSt userspace component can be compiled on Ubuntu 20.04 or newer.

1. We use the Google Bazel build system to compile the userspace components of ghOSt. Go to the Bazel Installation Guide for instructions to install Bazel on your operating system.

2. Install ghOSt dependencies:

sudo apt update
sudo apt install libnuma-dev libcap-dev libelf-dev libbfd-dev gcc clang-12 llvm zlib1g-dev python-is-python3

Note that ghOSt requires GCC 9 or newer and Clang 12 or newer.

3. Compile the ghOSt userspace component. Run the following from the root of the repository:

bazel build -c opt ...

-c opt tells Bazel to build the targets with optimizations turned on. ... tells Bazel to build all targets in the BUILD file and all BUILD files in subdirectories, including the core ghOSt library, the eBPF code, the schedulers, the unit tests, the experiments, and the scripts to run the experiments, along with all of the dependencies for those targets. If you prefer to build individual targets rather than all of them to save compile time, replace ... with an individual target name, such as agent_shinjuku.


ghOSt Project Layout

  • bpf/user/
    • ghOSt contains a suite of BPF tools to assist with debugging and performance optimization. The userspace components of these tools are in this directory.
  • experiments/
    • The RocksDB and antagonist Shinjuku experiments (from our SOSP paper) and microbenchmarks. Use the Python scripts in experiments/scripts/ to run the Shinjuku experiments.
  • kernel/
    • Headers that have shared data structures used by both the kernel and userspace.
  • lib/
    • The core ghOSt userspace library.
  • schedulers/
    • ghOSt schedulers. These schedulers include:
      • biff/, Biff (bare-bones FIFO scheduler that schedules everything with BPF code)
      • cfs/ CFS (ghOSt implementation of Linux Completely Fair Scheduler policy)
      • edf/, EDF (Earliest Deadline First)
      • fifo/centralized/, Centralized FIFO
      • fifo/per_cpu/, Per-CPU FIFO
      • shinjuku/, Shinjuku
      • sol/, Speed-of-Light (bare-bones centralized FIFO scheduler that runs as fast as possible)
  • shared/
    • Classes to support shared-memory communication between a scheduler and another application(s). Generally, this communication is useful for the application to send scheduling hints to the scheduler.
  • tests/
    • ghOSt unit tests.
  • third_party/
    • bpf/
      • Contains the kernel BPF code for our suite of BPF tools (mentioned above). This kernel BPF code is licensed under GPLv2, so we must keep it in third_party/.
    • The rest of third_party/ contains code from third-party developers and BUILD files to compile the code.
  • util/
    • Helper utilities for ghOSt. For example, pushtosched can be used to move a batch of kernel threads from the ghOSt scheduling class to CFS (SCHED_OTHER).

Running the ghOSt Tests

We include many different tests to ensure that both the ghOSt userspace code and the ghOSt kernel code are working correctly. Some of these tests are in tests/ while others are in other subdirectories. To view all of the tests, run:

bazel query 'tests(//...)'

To build a test, such as agent_test, run:

bazel build -c opt agent_test

To run a test, launch the test binary directly:

bazel-bin/agent_test

Generally, Bazel encourages the use of bazel test when running tests. However, bazel test sandboxes the tests so that they have read-only access to /sys and are constrained in how long they can run for. However, the tests need write access to /sys/fs/ghost to coordinate with the kernel and may take a long time to complete. Thus, to avoid sandboxing, launch the test binaries directly (e.g., bazel-bin/agent_test).


Running a ghOSt Scheduler

We will run the per-CPU FIFO ghOSt scheduler and use it to schedule Linux pthreads.

  1. Build the per-CPU FIFO scheduler:
bazel build -c opt fifo_per_cpu_agent
  1. Build simple_exp, which launches a series of pthreads that run in ghOSt. simple_exp is a collection of tests.
bazel build -c opt simple_exp
  1. Launch the per-CPU FIFO ghOSt scheduler:
bazel-bin/fifo_per_cpu_agent --ghost_cpus 0-1

The scheduler launches ghOSt agents on CPUs (i.e., logical cores) 0 and 1 and will therefore schedule ghOSt tasks onto CPUs 0 and 1. Adjust the --ghost_cpus command line argument value as necessary. For example, if you have an 8-core machine and you wish to schedule ghOSt tasks on all cores, then pass 0-7 to --ghost_cpus.

  1. Launch simple_exp:
bazel-bin/simple_exp

simple_exp will launch pthreads. These pthreads in turn will move themselves into the ghOSt scheduling class and thus will be scheduled by the ghOSt scheduler. When simple_exp has finished running all tests, it will exit.

  1. Use Ctrl-C to send a SIGINT signal to fifo_per_cpu_agent to get it to stop.

Enclaves, Rebootless Upgrades, and Handling Scheduler Failures

ghOSt uses enclaves to group agents and the threads that they are scheduling. An enclave contains a subset of CPUs (i.e., logical cores) in a machine, the agents that embody those CPUs, and the threads in the ghOSt scheduling class that the enclave agents can schedule onto the enclave CPUs. For example, in the fifo_per_cpu_agent example above, an enclave is created that contains CPUs 0 and 1, though the enclave can be configured to contain any subset of CPUs in the machine, and even all of them. In the fifo_per_cpu_agent example above, two per-CPU FIFO agents enter the enclave along with the simple_exp threads when the simple_exp process is started.

Enclaves provide an easy way to partition the machine to support co-location of policies and tenants, a particularly important feature as machines scale out horizontally to contain hundreds of CPUs and new accelerators. Thus, multiple enclaves can be constructed with disjoint sets of CPUs.

Rebootless Upgrades

ghOSt supports rebootless upgrades of scheduling policies, using an enclave to encapsulate current thread and CPU state for a policy undergoing an upgrade. When you want to upgrade a policy, the agents in the new process that you launch attempt to attach to the existing enclave, waiting for the old agents running in the enclave to exit. Once the old agents exit, the new agents take over the enclave and begin scheduling.

Handling Scheduler Failures

ghOSt also recovers from scheduler failures (e.g., crashes, malfunctions, etc.) without triggering a kernel panic or machine reboot. To recover from a scheduler failure, you should generally destroy the failed scheduler's enclave and then launch the scheduler again. Destroying an enclave will kill the malfunctioning agents if necessary and will move the threads in the ghOSt scheduling class to CFS (Linux Completely Fair Scheduler) so that they can continue to be scheduled until you potentially pull them into ghOSt again.

To see all enclaves that currently exist in ghOSt, use ls to list them via ghostfs:

$ ls /sys/fs/ghost
ctl  enclave_1	version

To kill an enclave, such as enclave_1 above, run the following command, replacing enclave_1 with the name of the enclave:

echo destroy > /sys/fs/ghost/enclave_1/ctl

To kill all enclaves (which is generally useful in development), run the following command:

for i in /sys/fs/ghost/enclave_*/ctl; do echo destroy > $i; done