bitcoin-dev-project/sim-ln

Design Doc: Generate Random Activity

carlaKC opened this issue · 2 comments

This issue is the proposed design document for #72.

Random Activity Generation

When we're looking to generate random activity for some topology, there are a few values that we need to pick:

  • How often do we fire payments?
  • How do we choose source and destination nodes?
  • How do we choose payment amounts?

Goals

  • Use the underlying graph topology to inform the types of payment flows generated for the network.
  • Introduce randomness to payment timing, amount and destination.
  • Scale payment activity by node size, meaning that nodes with more liquidity deployed will send payments more frequently.

User Configuration

Goals for user configuration of this functionality:

  1. Quickstart: just specify nodes and generate random activity for all nodes.
  2. Combo: allow combinations of activity_description and randomly generated activity.
  3. Subset: limit random activity to a subset of nodes.

This covers a reasonably large range of use cases without blowing up configuration complexity too much.

Proposed handling:

  • If no activity_description is present: run random activity with all nodes.
  • Add a random_activity section that is just a list of pubkeys.
    • May be present alongside a activity_description
    • If specified, only run random activity on the nodes listed.

So, for our original goal use cases we'd have:

  1. Quickstart: nodes -> runs random activity on all nodes.
  2. Combo: nodes, activity_description, random_activity = [A, B] -> runs activity descriptions and random activity on A and B.
  3. Subset: nodes, random_activity = [B, C] -> runs random activity on a subset of nodes.

How often do we fire payments?

We can conceptually think about the level of network activity as a question of capital efficiency: "In a month of activity, how many times over do nodes in the network send its total deployed capital?"

  • If it's 0.5x, the network sends half its total capacity in payments in a month.
  • If it's 1x, the network sends exactly its total capacity in payments in a month.
  • If it's 3x, the network sends three times its total capacity in payments in a month.

For our simulation, we will consider the following:

  • channel_capacity: the sum of the liquidity of a node’s announced channels / 2 (so that we don’t double count capacity for channel counterparties).
  • multiplier: the multiplier on channel_capacity that determines the total sent in a month.
  • expected_payment_size: 100_000 satoshis, the expected payment size in the network (further specified below).

For each node in the network we calculate the total number of events that we expect in a month:
total_events = (channel_capacity * multiplier) / expected_payment_size

An exponential distribution with lambda = total_events/seconds_per_month can be used to generate events at irregular intervals. Sampling this distribution returns the amount of time that the producer needs to sleep until the next event is fired.

How do we choose destination nodes?

As outlined in the section above, nodes with more capital deployed are more likely to be chosen as the source for payments. Destination nodes are chosen using a weighted random distribution, using channel capacity as weights. This follows the intuition that nodes with more capital deployed on the network are more active.

For the first version of this feature, we will only consider destinations that are within our set of nodes. This ensures that liquidity does not leave our "system", and we're more likely to be able to continuously make payments. If we're okay with this drain, we can add an external flag to random_activity to specify that we want destinations that are outside of our control.

How do we choose payment amounts?

Payment amounts are chosen using a log normal distribution with a mean based on expected_payment_size and variance that scales with the size of the sending and receiving node. This means that on average, nodes will send the expected_payment_size, but larger nodes will be more likely to send a range of small and large amounts and smaller nodes will be more likely to send around the expected_payment_amount.

The payment size will be capped at 50% of the MIN(sender capacity, receiver capacity) because we can roughly expect payment sizes to correspond to the amount of capital that the participants have on the network.

What does the code change look like?

For a first version, we'll assume a relatively static graph so that we don't need to worry about changes in channel capacity too much. The sections that follow outline logical chunks that the work can be divided into.

User Config Updates

  • Make activity_description optional.
  • Add parsing for random_activity and pass to Simulation::new.

Simulation Run

  • Move consume_events spawning out of generate_activity, just pass a HashMap<Pubkey, Sender> of consumers into generate_activity.
  • Spin up a consumer task for every node (right now we only create consumers for nodes that are source nodes).
  • Add a new generate_random_activity task that accepts a HashMap<Pubkey, Sender> that is responsible for executing random activity.

Event Producer

To produce random payment events, we’ll follow the existing producer/consumer pattern in the codebase. Each producer should be run as its own task, and connected to a consumer via mpsc channel.

Producer functionality:

  • Calculate: total_events = (channel_capacity * multiplier) / expected_payment_size
  • Create dist = ExponentialDistribution::new(total_events/seconds_in_month)
  • Get next_event_wait = dist.sample()
  • Select { sleep(next_event_wait), exit }
  • When sleep has elapsed, fire a payment event and re-sample for the next wait time.

Destination Choice

Destination choice is simply a matter of using a weighted random distribution to select a destination node. Care should be taken to implement this choice with a level of abstraction in place to allow easy substitution in future.

Amount Choice

Once a source and destination have been selected, their capacity is used to determine the variance of a log normal distribution that will produce payment amounts. We will limit the values generated for a given source and distribution to half the capacity of the source and destination channel:

  • payment_limit: 0.5 * MIN(source average channel size, destination average channel size)

To obtain a distribution that will produce payments around expected_payment_amount with 95% of payments falling beneath our payment_limit the distribution should be produced with the following parameters:

  • mean: 2ln(expected_payment_amount) - ln(payment_limit)
  • std dev: sqrt( 2(ln(limit) - ln(mean) )

Why does this suck?

  1. Assuming that every node in the network has the same capital efficiency isn't very realistic (some have better, some have worse).
  2. Weighting source and destination by channel capacity biases the simulation towards producing more activity between nodes with a lot of capital deployed, and makes payments between low capital nodes (such as individual raspberry pi operators) less likely.
  3. Some large nodes on the real network operate exclusively as routing nodes that do not forward payments - this solution will have them sending payments.
sr-gi commented
- channel_capacity: the sum of the liquidity of a node’s announced channels / 2 (so that we don’t double count capacity for channel counterparties).

Should we be calling this node_capacity instead? One of the reasons why this sounded a bit odd to me when we talked about it is because we are overloading the concept.

Also, should we replaceexpected_payment_size with expected_payment_amount? I think the latter makes more sense.

Should we be calling this node_capacity instead? One of the reasons why this sounded a bit odd to me when we talked about it is because we are overloading the concept.

Also, should we replaceexpected_payment_size with expected_payment_amount? I think the latter makes more sense.

Yeah I think both of those read better!