bootstrap

This is a CloudFlare Worker that allows holochain networks to bootstrap.

Installing

The worker code is all written in typescript using npm.

Tested on CI against node major versions 12 and 14 on ubuntu.

Standard npm install to install.

Standard npm test to test. Unit tests will be run via ts-node/ts-mocha. Integration tests use the provided run-integration-test.js script which first launches a miniflare local cloudflare simulator, then executes the integration test suite making api calls against this simulator.

Forking

We expect and encourage developers to fork and use this code for deployment on their own CloudFlare account.

Simply update the wrangler.toml with your CF account details and ensure that github secrets have CF_API_TOKEN set for production deployment.

Why do holochain networks need a bootstrap service?

tl;dr: to mitigate eclipse attacks.

Holochain is commonly discussed in terms of 'the DHT' (Distributed Hash Table).

Data is distributed across all the nodes on 'the network' and validated by a deterministic but random (based on cryptographic hash) set of nodes according to a set of rules (defined in wasm) implemented as callback functions.

All of that happens in the 'conductor'.

Nodes send data to each other to maintain 'the DHT' without central servers.

Basic DHT behaviour includes (for example):

Redundant data storage and healing as nodes join and leave the network
Identifying and mitigating bad actors
Direct p2p realtime communications and RPC style wasm calls

Being more specific there are two DHTs.

One DHT tracks the signed (by the node) ephemeral network locations of nodes in parallel to the 'main DHT' that handles all the holochain data and validation.

We can call this the 'agent DHT', it is much simpler:

The rules for agent data validation is hardcoded into the networking layer
It only tracks agent IDs (pubkeys) and spaces (DHT/DNA hashes)
It requires agents to pass a challenge to participate

Unlike the data DHT, where there may be many TBs of data in a large, active DHT, the agent info is relatively small and each agent can only broadcast a single agent info data point per space.

It's expected that each agent can hold a significantly higher percentage of the agent DHT data than a typical data DHT. Any agent lookup can often be completed in zero or not many hops.

The agent's signature of their current network location is returned alongside their information and validated. Malicious actors on the network cannot tamper with another agent's location, the worst they can do is withold another agent's location, but as long as at least one honest agent is returning the signed agent location, that agent is discoverable.

At this point we still have two big, obvious problems:

How do we handle firewalls etc.?
How does a new node safely find an honest node in the first place?

The solution to the first problem is handled via. the holochain proxy and is totally different and separate to the bootstrap service.

The proxy allows nodes that are already aware of each other indirectly to open connections to each other directly, regardless of firewalls, etc.

The bootstrap service allows nodes to advertise their current network location independant of the agent DHT.

For example, this repository implements a bootstrap service as:

A simple POST based API that accepts signed agent info
A CloudFlare backed key/value store
Agent information automaticaly expires (is deleted)
The service can be forked/copied by any hApp developer and deployed to their own CloudFlare account
'Trusted' agent public keys can be set by the service owner to further mitigate eclipse attacks at the expense of needing to maintain high(ish) availability nodes (not implemented yet)

This allows nodes that want to safely join a DHT space to prepopulate their agent locations with everyone advertising themselves periodically.

Limitations of the boostrap service

There are some obvious limits of the bootstrap service as currently implemented.

Some of these limitations can be mitigated relatively easily and others need more effort or domain specific solutions.

Sybil ghost network

It's pretty easy for someone to spam the kv store with apparently valid data that has been signed by a garbage keypair and doesn't lead anywhere.

This would create a ghost network where so few listed agents are real that a new user cannot open any useful connections.

It also puts pressure on the server, which in this case is CloudFlare so I'm sure they can handle it, but it may result in additional costs for the account owner.

Mitigations:

Trust and delegated trust model (need to be a dev or approved by a dev)
Identiy/auth based challenge (e.g. DPKI)
Anti-spam/throttling challenges (e.g. proof of work)
Proof of unique human (e.g. QR code systems like bright ID)

Anything that meaningfully raises the bar for entry above 'can sign data' is useful mitigation here.

At the time of writing we are simply expiring all key/value pairs after some time, which is a relatively weak challenge but at least sybils will fade quickly unless there is a dedicated machine somewhere actively generating them over a long period of time.

Additionally, CloudFlare themselves implement anti-bot protections at the network layer that we passively benefit from simply by using their service.

Eclipse attack

Similar to the ghost town situation, a more sophisticated attack generates a large number of agents that do resolve to a real connection.

The real but malicious connections 'fork' new users off onto a parallel set of DHTs. The sheer number of fake accounts defeats the bootstrap service as a means to avoid eclipse attacks because some percentage of new users will never find an honest signal among the malicious noise.

For an arbitrarily sophisticated sybil we can't hope to automatically detect them at the bootstrap service level with an algorithm.

At some point an element of trust will need to be applied to the bootstrapping process.

Even monero, a world class privacy and trust minimised blockchain, relies on a website Monero World to list out some trusted nodes that can bootstrap new users onto the monero network safely. This relies on users finding the list when they start their wallet, and downloading a safe wallet in the first place, and trusting the developers that write the monero code, and the machine the user runs the code on... etc.

The best we can do is to allow for decentralisation so that no party can force themselves to be 'trusted' and to implement an explicit trust model so that agents can be vetted by each other.

This is similar to the ERC-20 lists used by uniswap, there is a default list maintained by uniswap and then several dozen community maintained lists. Users of uniswap then select which list they'd like to opt-in to in order to be protected against phishing and other scams.

This bootstrap service currently has no concept of trust in it but it will in the future.

The random endpoint does enforce that random agents are returned from the running service, so that a client cannot be tricked into selecting specific agents from the listings. This puts additional trust on CloudFlare (see below).

With any trust model, there will be some set of public keys that agents would be strongly encouraged to prioritise when joining a network.

These public keys would be set after some kind of elevated access challenge. For example:

Set directly in the CloudFlare interface by the account owner (developer)
Requiring a signature from another already-trusted agent (delegation)
Requiring a signature from some external system (identity)
Some other challenge (algorithmic, API key, etc.)

So then the service owner sets the trust model, populates and maintains the trusted public keys, then end-users opt in to a bootstrap service that they decide to trust the operator and model of.

DOS attack

CloudFlare themselves are one of the world leaders in mitigating DOS attacks for their clients.

The logic in the workers is limited to simple cryptographic checks and direct interactions with the CloudFlre kv store.

It's unlikely that an attacker could exploit something in this repository that brings down the service for honest users. The worst they could do is trip the 10-50ms CPU circuit breaker on an individual request and see a 500 error for themselves.

Trusting CloudFlare

Of course, all this talk of explicit trust is ignoring the need to implicitly trust CloudFlare as the infrastructure provider of the kv service.

Given that we're cryptographically signing absolutely everything, the damage that CloudFlare can do is limited to witholding data or failing to provide their service. They cannot tamper with or inject any additional data.

CloudFlare returns the original agent info bytes alongside the agent pubkey and signature to all agents, so that every agent can independently verify the data. Agents do not need to trust that CloudFlare has not tampered with the data because they SHOULD do their own cryptographic verification of all data returned from any boostrap service. This removes the tempation for an attacker to attempt to hijack a bootstrap service to invisibly serve up bad network agent network locations.

To mitigate the need to trust CloudFlare in general we have a well defined and very simple POST API that most web developers could be confident in implementing correctly. This way they can build their own binaries and servers that are compatible with the holochain conductors and host these anywhere.

The main concern is that the random op is opaque from the caller's point of view and this is where CloudFlare could collude with (or be hacked by) an attacker to "randomly" only return malicious nodes.

The main two defenses against CloudFlare are (not implemented yet):

Developers easily forking the service and users easily configuring many bootstrap services in conductors (decentralised bootstrap).
Signed responses from bootstrap services as part of the API so that agents can audit and cross-reference responses against what is found in the DHT and/or other services, similar to how time audits work in the Roughtime protocol.

API

POST method

All API requests use the HTTP POST method.

This is for both gets and sets.

This is so that we can use messagepack binary data as-is for all requests and responses with no 'extra steps' like handling base64 encoding/decoding and URL parsing just to work with binary data.

GET ping

There is one GET endpoint, used for debugging, testing and health checks.

All GET requests to the bootstrap service will receive the response string OK encoded as UTF-8 text data and the 200 status code.

The GET endpoint behaves differently to all other methods in that it is not serialized, is not binary data, has no op headers and cannot interact with the kv service at all.

MessagePack serialization

All requests and responses are serialized as binary MessagePack data in the body of the request/response.

The op header (see below) defines which operation the messagepack payload in a request is dispatched to and what kind of response will be returned.

This is the same serialization format that holochain itself uses for wire messages on the network and for compatibility with wasm for validation logic.

Ed25519/NaCl crytography

All cryptographic logic is handled as per libsodium using the Ed25519 curve.

This is the same cryptography as holochain itself which means the signatures and validation used by the bootstrap service are the same as those used by the agent DHT by conductors.

This implementation uses tweetnacl for validation but does not ever generate any signatures.

Headers

The Content-Type header for all POST requests must be application/octet to signify to the server that the body of the request is a binary payload.

The action to be performed is set by the X-Op header in the POST request.

The possible values are:

put: store signed agent info
random: retrieve up to N random agents
now: get the current server time as a unix milliseconds timestamp

Data structures

Agent info

AgentInfoSigned is the main data structure.

It is the same structure defined by kitsune_p2p in the Rust codebase for the conductor, but ported to typescript for validation here.

It looks like this on the wire:

{
 signature: Uint8Array,
 agent: Uint8Array,
 agent_info: Uint8Array,
}

Where the agent_info is messagepack serialized binary data that MUST be valid for the agent public key and signature bytes according to libsodium.

If the agent_info is not valid then it MUST be discarded and any further logic abandoned because this is ALWAYS malicious or corrupt data. The inner agent_info MUST NOT be deserialized if it is invalid.

When the agent_info is validated and unpacked it looks like this:

agent_info: {
 space: Uint8Array,
 agent: Uint8Array,
 urls: Array<string>,
 signed_at_ms: number,
}

space is the bytes of the hash of the DNA used to connect to the DHT
agent is the same as the signing key above and MUST match it
urls is an array of strings that are the URLs the agent can be found at
signed_at_ms is the unix millisecond timestamp of the signing

The AgentInfoSigned packed data is saved and retrieved by the bootstrap service. We store exactly what is given to us by the agent alongside its cryptographic integrity and authenticity proof (signature).

This allows all agents using the bootstrap service to redundantly verify the data for themselves which is an important hedge against a compromised service.

KV keys

Valid AgentInfoSigned data is stored under the binary concatenation of space + agent in the CloudFlare kv store.

For example, if there was a space [1, 2, 3] and agent [4, 5, 6] the kv key would be [1, 2, 3, 4, 5, 6].

Technically CloudFlare kv does not support prefix based lookups (which we need) for raw binary keys, so internally we base64 the space and agent separately before concatenating them. This is an internal implementation detail only so any attempt to externally interact with keys as base64 data will fail because the input/output will be treated as raw binary bytes, not utf8 encoded data.

This is more efficient on the wire and decouples the messagepack binary API design from the CloudFlare key prefix lookup implementation.

For example, when performing a get op the POST body would contain the space and agent key raw binary bytes, not a base64 or utf8 representation of these.

Ops

Put

Put a signed agent info into the kv store.

X-Op header: put

Request body: Messagepack serialized AgentInfoSigned data (see above).

Successful response: Messagepack encoded null, i.e. [ 192 ] binary body.

If the AgentInfoSigned data validates on the CloudFlare worker it will be saved under the kv key (see above) for the parsed space and agent.

The value will expire (be deleted) at signing time + expires after, as per the signed agent information.

The expectation is that agents repost their current location periodically to maintain liveness.

Random

Get up to limit random AgentInfoSigned for a given space.

X-Op header: random

Request body: Messagepack serialized { space: Uint8Array, limit: number }. The limit must be a positive integer.

Successful response: Messagepack serialized array of AgentInfoSigned data. If there are at least limit agents in the space then there will always be limit random agents returned. If there are less than limit agents in the space then limit agents will be returned in random order. If there are no agents a messagepack empty array, i.e. [221, 0, 0, 0, 0].

This is the default and recommended way for an agent to fetch node information as it balances network efficiency against eclipse mitigation via randomness.

Agents are encouraged to fetch as many random agents as they can comfortably handle to maximise the diversity of their view on the network before they attempt to join, which has benefits beyond eclipse protection.

Now

Get the time 'now' from the service as a millisecond unix timestamp.

The signed_at_ms SHOULD be in the past from the perspective of the bootstrap service but this is NOT enforced. The correctness of times remains between agents and so the bootstrap service is agnostic to far future signed and expiry times. That said, the bootstrap service will not hold data longer than an hour and peers are free to ignore bad times (in their opinion).

If an agent wants some assurance that the signing time will be accepted by their peers it can first call now and then use the returned timestamp for signing.

An agent can call now once upon booting a conductor and then calculate an offset relative to their agent local time, then use the offset for as long as it is safe to assume that the local clock has not shifted.

A full clock sync algorithm like (S)NTP is NOT required, the signing time simply needs to be within a few seconds on both machines and in the past from the perspective of the service. A simple min comparison with the local time, or even direct deferral to the server time is likely sufficient.

Validation

Validation rules are well defined for signed agent info and all other binary data is fixed sized.

SignedAgentInfo validation

Validation is a 'chained' operation in that each step of the validation will be attempting to verify some aspect of the data. Any step that fails MUST abort the entire validation chain as a failure to validate the data. That is to say, any corrupt or bad data MUST immediately stop validations and return an error. The error SHOULD be descriptive to aid logging and debugging.

The raw SignedAgentInfo on the wire will be a messagepacked object with keys signature, agent, and agent_info and binary array values.
Attempt to decode the messagepack data into the object.
Check that the signature is 64 bytes long, as per Ed25519 signatures.
Check that the agent pubkey is 32 bytes long, as per Ed25519 public keys.
Use libsodium to verify the agent_info bytes using the signature and agent pubkey, as per Ed25519.
IF the signature is valid, attempt to deserialize the agent_info bytes using messagepack to an AgentInfo object (see above).
Check the space is 32 bytes long, as per base HoloHash bytes.
Check the agent is 32 bytes long, as per Ed25519 public keys.
Check the agent bytes are equal to the agent bytes used to verify the signature above.
Check the urls is an array of utf8 strings.
Check there are 256 or fewer urls in the array.
Check every url is 2048 or fewer utf8 bytes, e.g. utf8 multibyte characters are counted as several bytes towards the limit.
Check the signed_at_ms is an integer.
Check the signed_at_ms is a positive number.
Check the expires_after_ms is an integer.
Check the expires_after_ms is between MIN_EXPIRES and MAX_EXPIRES. These are currently 1 minute and 1 hour respectively, in milliseconds.

holochain/bootstrap