/omicron

Omicron: Oxide control plane

Primary LanguageRustMozilla Public License 2.0MPL-2.0

Oxide Control Plane

This repo houses the work-in-progress Oxide Rack control plane.

badge

Omicron is open-source. But we’re pretty focused on our own goals for the foreseeable future and not able to help external contributors. Please see CONTRIBUTING.md for more information.

Documentation

Docs are automatically generated for the public (externally-facing) API based on the OpenAPI spec that itself is automatically generated from the server implementation. You can generate your own docs for either the public API or any of the internal APIs by feeding the corresponding OpenAPI specs (in ./openapi) into an OpenAPI doc generator.

There are some internal design docs in the ./docs directory.

For more design documentation and internal Rust API docs, see the generated Rust documentation. You can generate this yourself with:

$ cargo doc --document-private-items

Note that --document-private-items is configured by default, so you can actually just use cargo doc.

Folks with access to Oxide RFDs may find RFD 48 ("Control Plane Requirements") and other control plane RFDs relevant. These are not currently publicly available.

Build and run

Omicron has two modes of operation: "simulated" and "non-simulated".

The simulated version of Omicron allows the high-level control plane logic to run without actually managing any sled-local resources. This version can be executed on Linux, Mac, and illumos. This mode of operation is provided for development and testing only.

To build and run the simulated version of Omicron, see: docs/how-to-run-simulated.adoc.

The non-simulated version of Omicron actually manages sled-local resources, and may only be executed on hosts running Helios. This mode of operation will be used in production.

To build and run the non-simulated version of Omicron, see: docs/how-to-run.adoc.

rustfmt and clippy

You can format the code using cargo fmt. Make sure to run this before pushing changes. The CI checks that the code is correctly formatted.

You can run the Clippy linter using cargo clippy -- -D warnings -A clippy::style. Make sure to run this before pushing changes. The CI checks that the code is clippy-clean.

Docker image

This repo includes a Dockerfile that builds an image containing the Nexus and sled agent. There’s a GitHub Actions workflow that builds and publishes the Docker image. This is used by cli for testing. This is not the way Omicron will be deployed on production systems, but it’s a useful vehicle for working with it.

Configuration reference

nexus requires a TOML configuration file. There’s an example in nexus/examples/config.toml.

Supported config properties include:

Name Example Required? Description

database.url

"postgresql://root@127.0.0.1:32221/omicron?sslmode=disable"

Yes

URL identifying the CockroachDB instance(s) to connect to. CockroachDB is used for all persistent data.

dropshot_external

Yes

Dropshot configuration for the external server (i.e., the one that operators and developers using the Oxide rack will use). Specific properties are documented below, but see the Dropshot README for details. Note that this is an array of external address configurations; multiple may be supplied.

dropshot_external.bind_address

"127.0.0.1:12220"

Yes

Specifies that the server should bind to the given IP address and TCP port for the external API (i.e., the one that operators and developers using the Oxide rack will use). In general, servers can bind to more than one IP address and port, but this is not (yet?) supported.

dropshot_external.request_body_max_bytes

1000

Yes

Specifies the maximum request body size for the external API.

dropshot_internal

Yes

Dropshot configuration for the internal server (i.e., the one used by the sled agent). Specific properties are documented below, but see the Dropshot README for details.

dropshot_internal.bind_address

"127.0.0.1:12220"

Yes

Specifies that the server should bind to the given IP address and TCP port for the internal API (i.e., the one used by the sled agent). In general, servers can bind to more than one IP address and port, but this is not (yet?) supported.

dropshot_internal.request_body_max_bytes

1000

Yes

Specifies the maximum request body size for the internal API.

id

"e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c"

Yes

Unique identifier for this Nexus instance

log

Yes

Logging configuration. Specific properties are documented below, but see the Dropshot README for details.

log.mode

"file"

Yes

Controls where server logging will go. Valid modes are "stderr-terminal" and "file". If the mode is `"stderr-terminal", human-readable output, with colors and other terminal formatting if possible, will be sent to stderr. If the mode is "file", Bunyan-format output will be sent to the filesystem path given by log.path. See also log.if_exists, which controls the behavior if the destination path already exists.

log.level

"info"

Yes

Specifies what severity of log messages should be included in the log. Valid values include "trace", "debug", "info", "warn", "error", and "critical", which are increasing order of severity. Log messages at the specified level and more severe levels will be included in the log.

log.path

"logs/server.log"

Only if log.mode = "file"

If log.mode is "file", this property determines the path to the log file. See also log.if_exists.

log.if_exists

"append"

Only if log.mode = "file"

If log.mode is "file", this property specifies what to do if the destination log file already exists. Valid values include "append" (which appends to the existing file), "truncate" (which truncates the existing file and then uses it as though it had just been created), and "fail" (which causes the server to exit immediately with an error).

Configuring ClickHouse

The ClickHouse binary uses several sources for its configuration. The binary expects an XML config file, usually named config.xml to be available, or one may be specified with the -C command-line flag. The binary also includes a minimal configuration embedded within it, which will be used if no configuration file is given or present in the current directory. The server also accepts command-line flags for overriding the values of the configuration parameters.

The packages downloaded by ci_download_clickhouse include a config.xml file with them. You should probably run ClickHouse via the omicron-dev tool, but if you decide to run it manually, you can start the server with:

$ /path/to/clickhouse server --config-file /path/to/config.xml

The configuration file contains a large number of parameters, but most of them are described with comments in the included config.xml, or you may learn more about them here and here. Parameters may be updated in the config.xml, and the server will automatically reload them. You may also specify many of them on the command-line with:

$ /path/to/clickhouse server --config-file /path/to/config.xml -- --param_name param_value ...

Generated Service Clients and Updating

Each service is a Dropshot server that presents an HTTP API. The description of that API is serialized as an OpenAPI document which we store in omicron/openapi and check in to this repo. In order to ensure that changes to those APIs are made intentionally, each service contains a test that validates that the current API matches. This allows us 1. to catch accidental changes as test failures and 2. to explicitly observe API changes during code review (and in the git history).

We also use these OpenAPI documents as the source for the clients we generate using Progenitor. Clients are automatically updated when the coresponding OpenAPI document is modified.

Note that Omicron contains a nominally circular dependency:

  • Nexus depends on the Sled Agent client

  • The Sled Agent client is derived from the OpenAPI document emitted by Sled Agent

  • Sled Agent depends on the Nexus client

  • The Nexus client is derived from the OpenAPI document emitted by Nexus

We effectively "break" this circular dependency by virtue of the OpenAPI documents being checked in.

In general, changes any service API require the following set of build steps:

  • Make changes to the service API

  • Build the package for the modified service alone. This can be done by changing directories there, or cargo build -p <package>. This is step is important, to avoid the circular dependency at this point. One needs to update this one OpenAPI document, without rebuilding the other components that depend on a now-outdated spec.

  • Update the OpenAPI document by running the relevant test with overwrite set: EXPECTORATE=overwrite cargo test test_nexus_openapi_internal (changing the test name as necessary)

  • This will cause the generated client to be updated which may break the build for dependent consumers

  • Modify any dependent services to fix calls to the generated client

Note that if you make changes to both Nexus and Sled Agent simultaneously, you may end up in a spot where neither can build and therefore neither OpenAPI document can be generated. In this case, revert or comment out changes in one so that the OpenAPI document can be generated.