/gigue

🕺 Interpretation loop and JIT code generator to benchmark RISC-V isolation mechanisms

Primary LanguagePythonMIT LicenseMIT

Gigue: Benchmark Setup and Code Generator for JIT code on RISC-V

GithubActions


Gigue (french for jitter) consists of a machine code generator for RISC-V that mimics the execution of JIT code in concordance with an interpretation loop. The objective is to compare memory isolation memory on a simple model and easily (re-)generate the corresponding machine code parts for both the interpretation loop and JITed code. The base model generates an interpretation loop, a succession of calls to the JIT code. It generates a static binary with both binaries (interpretation loop and JIT elements) along with data the JIT elements use and basic OS facilities to run on top of Verilator emulators from open-source cores (the Rocket CPU and CVA6 were tested!).

Installation

The project was developed using pipenv and Python 3.9. whose installation is presented below as well:

  • pyenv installation:
# Install required library headers for pyenv
sudo apt-get install build-essential zlib1g-dev libffi-dev libssl-dev libbz2-dev libreadline-dev libsqlite3-dev liblzma-dev

# Install pyenv to manage Python versions
curl https://pyenv.run | bash

# Update PATH (append these to ~/.bashrc)
export PYENV_ROOT="$HOME/.pyenv"
command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init -)"
  • pipenv installation:
# Install pip
sudo apt-get install python3-pip

# Install pipenv
pip install --user pipenv

# Update PATH (append these to ~/.bashrc)
export PIPENV_BIN="$HOME/.local/bin"
command -v pipenv >/dev/null || export PATH="$PIPENV_BIN:$PATH"
  • gigue installation:
# Install gigue and its pipenv environment
git clone git@github.com:QDucasse/gigue.git
pipenv install
pipenv shell

Gigue: CLI and Usage

Gigue supports three different usages and their corresponding CLIs: Gigue itself to generate one binary (and facilities to execute it), Toccata to generate several binaries, run them and extract information from the logs, and Prelude to display helper infos to implement custom instructions for toolchains/processors and generate minimal binaries containing these instructions.

Gigue: Binary Generation and Execution

The binary generator CLI and its main arguments are the following (others are defined in gigue/cli.py):

# Binary generator CLI
python -m gigue -h
usage: __main__.py 
    [-h] [-s SEED] [-a INTADDR] [-j JITADDR] [-t] 
    [-i ISOLATION] [-js JITSIZE] [-n NBMETH] [--regs REGS]
    [-vm VARMETH] [-vs STDEVMETH] 
    [-cm CALLMEAN] [-cs CALLSTDEV] [-cdm CALLDEPTHMEAN] 
    [--datareg DATAREG] [--datasize DATASIZE] [--datagen DATAGEN] 
    [-r PICRATIO] [--picmeancase PICMEANCASE] [--piccmpreg PICCMPREG]
    [--pichitcasereg PICHITCASEREG] 
    [-oi OUTINT] [-oj OUTJIT] [-od OUTDATA]

The whole list is available below but the most important ones are:

  • JIT size / methods number: define the size of methods and the binary
  • Method size variation parameters: define the amount of variation in method size
  • Call occupation parameters: define the call occupation of methods
  • PIC parameters: define the outline of PICs and their presence

An example generation command would be:

python -m gigue 
    -js 1500 -n  25      # Will create 25 methods of size 60
    -vm 0.2  -vs 0.1     # Size variation of 20% (10% std dev)
    -cm 0.2  -cs 0.1     # Call occupation of 20% (10% std dev)
    -cdm 2               # Mean call depth of 2
    --isolation rimifull # Domain isolation / shadow stack
    --datagen   random   # 200 bytes of random data
    --datasize  1600     #            -

This generates the following files in the bin/ directory:

  • int.bin: raw machine code of the interpretation loop (with trailing nops)
  • jit.bin: raw machine code of the JIT code region
  • data.bin: raw machine code of the generated data (following the data generation strategy)

To create a running executable, the environment variable RISCV should be defined to point to a working RISC-V toolchain (e.g.: export RISCV="/opt/riscv-newlib"), the instructions to setup a toolchain can be found on the `riscv-gnu-toolchain`` repo.

To run the executable on the Verilator model of a core, the environment variable EMULATOR should be defined to point to the directory containing the compiled Verilator model.

The Makefile then provides several commands:

  • make dump: (default) generates the executable binary and the different dumps
    1. compiles the different bare OS helpers (in resources/common).
    2. compiles the gigue binary using a template from templatesthat loads out.bin, loads data address in a register, and puts the data right after. The templates vary to provide ways to set up security modules (PMP), or relax the structure for instruction unit tests. This template can be specified using TEMPLATE=<template_name> (presented below).
    3. links them all together using a slightly modified linker script than the one provided in the riscv-tests test suite to generate an elf file.
    4. generates the dump of both the generated gigue binary alone (obtained by "forcing" a conversion using obj-copy before obj-dump) and the linked binary (and deleting the temporary files).
  • make exec runs the binary on top of the specified emulator, maximum test cycles can be specified with MAX_CYCLES=10000000.

For binaries, several templates are available that define additional subroutines for the binary. Among them:

  • base: both sides of the JIT generation (interpreter and binary) along with a .data section
  • pmp: base + a PMP config and setup
  • rimi: pmp + DMP config and setup
  • unit: includes unit tests examples
  • unitrimi: unit + PMP/DMP setup

By default, the template selected for Gigue is base/rimi and unit/unitrimi for unit tests. The template can be specified explicitely with for example:

make TEMPLATE=base

It is expecting the corresponding binaries (int.bin/jit.bin for base templates and unit.bin for unit tests).

Toccata: Running Benchmarks

Gigue provides another CLI to generate, run and qualify binaries named Toccata. All scripts are defined in the toccata section with the description of the data structures in data.py and the main element, runner.py.

The runner will (1) generate a binary, (2) compile it, and (3) run it on the specified emulator. The CLI provides two ways of qualifying a runner configuration:

  • (1) using a json config and providing it with:
python -m toccata config <your_file>.json

When using this method, a configuration file can be derived from the base one as defined in base_config.json:

{
    "nb_runs": 1,
    "run_seeds": [],
    "input_data": {
        "uses_trampolines": 1,
        "isolation_solution": "none",
        "registers": [5, 6, 7, 10, 11, 12, 13, 14, 15, 16, 17, 28, 29, 30, 31],
        "weights": [25, 30, 10, 5, 10, 10, 10],
        "interpreter_start_address": 0, 
        "jit_start_address": 12288,
        "jit_size": 10000,
        "jit_nb_methods": 100,
        "method_variation_mean": 0.2,
        "method_variation_stdev": 0.1,
        "call_depth_mean": 2,
        "call_occupation_mean": 0.2,
        "call_occupation_stdev": 0.1,
        "pics_ratio": 0.2,
        "pics_mean_case_nb": 1,
        "pics_cmp_reg": 6,
        "pics_hit_case_reg": 5,
        "data_reg": 31,
        "data_size": 1600,
        "data_generation_strategy": "random",
        "max_emu_cycles": 10000000
    }
}

It involves the same arguments as the ones the gigue CLI uses and more metadata such as the seed or the isolation solution used.

  • (2) using presets as defined in the CLI:
python -m toccata param -n low -c high --isolation rimifull

The following parameters are accessible (seen with python -m toccata param -h):

  • -r the number of runs for a similar config
  • -n the number of methods (low/medium/high)
  • -c the call occupations (low/medium/high)
  • -m the memory access intensities (low/medium/high)
  • -s the seeds
  • -i the isolation solution

For each method, the following actions are performed:

  1. checks the environment variables (RISCV for the toolchain and EMULATOR for the compiled emulator),
  2. loads the configuration file,

(for each run)

  1. generates the binary according to the input parameters,
  2. parses the elf dump to extract the start address, end address and return address,
  3. runs the binary on top of the emulator and extracts the number of cycles needed to run the binary,
  4. stores the dumps and core logs for each run in the corresponding toccata/results/<config_name>_<datetime>/<run_nb> directory and the data.json in the parent directory.

Note: each step contains a <step>_ok parameter that ensures the process went correctly (but does not stop the benchmark altogether).

Prelude: Helper and Minimal Binaries

Prelude provides simple helpers to display the changes needed to integrate custom instructions in well-known cores/toolchains. The helper functions available are gnu, rocket, and cva6, and are accessed with:

python -m prelude helper <helper_name>

Prelude also generates binaries with a minimal number of instructions to unit test custom instructions. Each new instruction is defined with an example (list of instructions) inside a tutorial. For example:

RIMI_TUTORIAL: Tutorial = Tutorial(
    examples=[
        # ==========================
        # load/stores:
        #    add value to check
        #    store value
        #    load value back
        # ==========================
        InstructionExample(
            ["lb1", "sb1"],
            [
                IInstruction.addi(rd=10, rs1=0, imm=0x12),
                RIMISInstruction.sb1(rs1=31, rs2=10, imm=0),
                RIMIIInstruction.lb1(rd=11, rs1=31, imm=0),
            ],
        ),
        ...
    ]
)

The same example is used for lb1 and sb1 as they load/store a value! The corresponding binary is generated using:

python -m prelude instr lb1 -t unit

Note: -t chooses the template for the binary generation as presented earlier.

Gigue code structure

The project consists of four main parts:

  • Instruction builder: generates instructions as defined in the constants and instructions source files.

    found in constants.py for the raw instruction data, instructions.py for instruction helpers, builder.py that defines static methods to generate both single instructions and machine code stubs (e.g. calls, switches, etc.)

  • JIT elements: model for the methods and PICs contained in the JIT binary.

    found in method.py, pic.py and trampoline.py for their respective classes

  • Generator: the global element that obtains parameters and generates the corresponding binaries through the process of: filling the JIT code with elements, generating instructions inside them, patching these instructions with calls to other elements, generating the interpretation loop then writing the binary files.

    found in generator.py for the main code and cli.py for argument passing.

  • Isolation Solutions: each isolation solution redefines its constants, additional builder utilities and a dedicated generator.

    found in their corresponding folders, e.g. fixer or rimi.

Development

Additional dependencies are required to run the project tests through pytest and tox for different environments (systems/python versions) that will then be used in the CI:

# Install test dependencies
pipenv install --dev
# Launching the tests
pytest
# Running all test environments
tox
# Running the linters and type checker (do it before pushing!)
tox -e check

The CI uses tox to run the different tools on the code before running the tests for each environment defined in the GitHub actions. Tools used in the CI are:

  • black, the code formatter
  • ruff, the import sorter and linter
  • mypy, the type checker

Unit tests use Unicorn as a CPU simulator and Capstone as a disassembler (along with an inhouse disassembler).

Parameters

The full list of the parameters defining Gigue and available through its CLI is the following:

Name Shortcut Description Default value
Seed -s Seed for generation replication bytes_to_int(os.urandom(16))
Interpreter start address -a Start address for the interpretation loop 0x0
JIT start address -j Start address for the JIT code region 0x2000
Isolation solution -i Chosen isolation setup none
Uses trampolines -not Revokes the use of trampolines (prevents the usage of some isolation solutions) false
JIT code region size -js Size in bytes of the JIT code region 1000
Methods number -n Number of methods in the JTI code region 100
Available registers --regs Usable registers for the generation RISC-V Callee-saved registers
Method variation (variance) -vm Mean variation of method sizes (parameter of TruncNorm distribution law) 0.2
Method variation (standard deviation) -vs Standard deviation of the variation of method sizes (parameter of a TruncNorm distribution law) 0.1
Call occupation (variance) -cm Mean call occupation of methods (parameter of a TruncNorm distribution law) 0.2
Call occupation (standard deviation) -cs Standard deviation of the call occupation of methods (parameter of a TruncNorm distribution law) 0.1
Mean call depth -cdm Mean of method call depths (parameter of a Poisson distribution law) 2
Data register --datareg Register storing the data address for simple offset access 31 (refers to X31)
Data size --datasize Size of the generated data 8 * 200
Data generation --datagen Data generation method (random, iterative, etc.) random
PIC ratio -r Amount of PIC generated compared to simple methods 0.2
PIC case number --picmeancase Mean number of PIC cases (parameter of a ZeroTruncPoisson distribution law) (1) 2
PIC cmp register --piccmpreg Register used to compare the class register to the expected value in the PIC switch case X6
PIC hit case register --pichitcasereg Class register X5
Interpreter binary name -oi - int.bin
JIT code binary name -oj - jit.bin
Data binary name -od - data.bin

(1) Note that the mean of a ZeroTruncPoisson distribution is not equal to its parameter but comes close to it as it becomes bigger.