Light nodes
UkoeHB opened this issue · 1 comments
Light nodes for wallets
Requirements/objectives
- Access blockchain data with reasonable confidence as to its legitimacy.
- Construct transactions without the aid of another node.
- Minimize data storage and verification requirements so points (1) and (2) are reasonably accessible to common users, and in the ideal case feasible for mobile devices.
Note1: this Lee-Miller paper (prefixing the url with 'sci-hub.tw/' might gain access to the pdf) describes a similar approach, in which wallets still have to cooperate with remote nodes to build transactions. My proposal is a more advanced self-reliant local light node which focuses on maximal pruning for user convenience, without sacrificing the basic abilities of a node. The Lee-Miller suggestion for conflict resolution between different queried nodes might be worth further consideration.
Note2: my proposal may be similar to SPV Bitcoin nodes, although I have not looked into it.
Method
- Assume blockchain data with a lot of proof-of-work stacked on top is legitimate. This means the light node only verifies block headers meet the proof of work requirement, and takes note of how much proof of work has actually occurred since a given block was mined. Such a node should, like any other node, focus on maintaining the chain with highest cumulative difficulty.
a) The light node can periodically poll other nodes for copies of high-level merkle roots, to try and find hints that the network is on the same page or not. - Assume the light node is set up for a very specific wallet. Most data not related to that wallet can be discarded once it has been deemed unnecessary.
- Instead of storing all block headers and block IDs, just push them into large merkle trees and only store the merkle roots. For example, 1000 blocks can be kept track of with only one 32byte merkle root.
a) To retain some flexibility in the case of porting a light node over to a new wallet, the merkle trees, instead of just taking block IDs as input, take a hash of the block ID PLUS all information related to scanning a new wallet that isn't otherwise stored. This includes (on a per block basis) for each transaction, the list of key images followed by list of tx public keys followed by list of encoded amounts (also commitment masks from the pre-optimization period) (tx infoA). We also include a hash of information that IS stored, to facilitate extra pruning (see below), the list of one time address and their respective output commitments (tx infoB). In other words, H(blockID, H(tx1 infoA), H(tx1 infoB), H(tx2 infoA), H(tx2 infoB)...). Scanning for a new wallet entails requesting from some other node just the tx infoA from each block (and tx infoB if some of it has been pruned [note that if some has been pruned, ALL should be requested to prevent other nodes from getting insight into the subset of unpruned outputs]; if not pruned, just get it from storage), in addition to the block ID. To verify all received information is legitimate, reconstruct the merkle tree and check that the previously stored merkle roots match. Note that other information related to scanning must be stored in order to construct transactions. - Note: a merkle tree is a series of binary hash branches, although I don't see a reason that is strictly necessary, as most blockchain operations for this kind of node will be over large chunks of data at a time. Moreover, it isn't wise to request specific historical transactions/blocks from other nodes since that reveals something about what our light node is up to (e.g. might imply that one of our owned outputs is in the specific request), so any request for data should be part of largish bins.
Creating a new light node, or adding recently mined blocks to it, involves downloading blockchain data, extracting all useful information, and discarding the rest. Prunable transaction data does not need to be downloaded, just its hash which goes into the ordinary transaction merkle tree as part of verifying a block's proof of work.
Information stored
- block id merkle checkpoints: after verifying the proof of work on a group of blocks (e.g. 1000 blocks at a time), create a merkle tree out of their block IDs and tx infos, and store the merkle root (these merkle roots could be organized into even larger merkle trees, e.g. 100 roots at a time, and then 10 roots on top of that with the highest level root encompassing 1 million blocks).
a) also keep track of difficulty parameters for verifying proof of work on new blocks; perhaps store ~1000 blocks explicitly in case of reorgs; the Lee-Miller paper recommends programatically extending the Merkle tree when adding new blocks, but I feel it's fine to just store up to 1500 blocks, and on the 1500th block turn the oldest 1k blocks into a merkle root; this way every multiple of 1k blocks has its own straightforward merkle root, and the node retains recent blocks in case of problems - info for creating any transaction: rct_offsets (cumulative output count by block), one time addresses and output commitments with their corresponding output indices (e.g. for ring member offsets)
- info about owned outputs: key images, transaction public keys, amounts, payment IDs, output indices from the tx they were created in (allows easy access to one time addresses and output commitments for later spending), spent/unspent designation
The number of owned outputs is negligible in comparison to the total number of outputs, so by using the block ID merkle checkpoint method the vast majority of data stored will be one time addresses and output commitments for outputs. This turns out to be ~2 GB of storage per 30 million outputs on the blockchain.
Note that bitcoin currently has 500 million transactions on the blockchain, which implies around 800-900 million outputs, so our light node would require ~60 GB, or about 20% of Bitcoin's full size.
Extensions
These are possible options a light node operator could be given.
- The amount of basic storage could be greatly reduced even further by randomly discarding ~75-95%% of all outputs, since it's likely that only a small portion will ever actually need to be used in making a transaction (this assumption would make porting the node to a new wallet more difficult; care would have to be taken not to discard too much in the case of next generation ring signatures that use advanced binning techniques, in addition to generally larger ring sizes). As the volume of transaction grows, the proportion of discarded outputs can also grow, since from the users point of view the amount of outputs necessary for transactions will remain constant.
a) Light nodes can always request data from other nodes, in case it pruned too much (or in case of using the same light node with a new wallet, which would involve at least a partial sync). - The light node, after its initial setup, can be set to validate the entire blockchain 'in the background'. This means, whenever it is given access to the network, it uses some non-intrusive percentage of CPU power to slowly validate every single block starting from the genesis message. The user can 'check up' on how many blocks have been validated, but that process doesn't take up too much storage since blocks are always discarded as soon as their merkle checkpoint has been verified. (inspired by Issue #54)
a) Similarly, after the initial setup, a light node can initiate a background double spend check by requesting a list of all block headers and tx infos. Once it sees that, up to the current height, no double spends have occurred, it can discard all that excess information. The light node may periodically perform double spend checks automatically, or on user request. - The light node can participate in validating new transactions and blocks, accepting connections from other nodes, checking the content, then extracting the info it needs before passing along to other nodes. Note that it would not be possible for light nodes to send historical data to other nodes, since it's all discarded. I don't see this as a loss since as long as all blocks are validated by many nodes (light or otherwise), there don't need to be a huge amount of complete copies of chain data. Just enough for reliable redundancy.
a) To facilitate validating blocks, the light node could optionally keep the key image set in storage, which would be about 1GB per 30 million key images. It may also choose to not check for double spend when validating new blocks. - Note that a light node on mobile device would be at risk of using enormous amounts of phone service data, so design of such a node should be extra careful not to overwhelm users bank accounts. This also goes for users running limited wifi data plans, and so on.
- The next generation of Monero's transaction protocol, which will likely mean larger ring sizes, may involve advanced deterministic binning techniques, where ring membership is indicated by bin IDs instead of explicit output offsets. I don't expect a light node implementation would be too much impacted by that development.
I would like to bring to your attention that a lot of work has been done to introduce new authenticated data structures and accumulators that have superseded the work in the Lee-Miller paper. Some examples include: