Use tx link instead of tx hash for input_point.
evoskuil opened this issue · 2 comments
When the tx is stored it has been fully validated, which means previous outputs have been looked up. This requires obtaining the tx.link of the prevout hash. Storing the link vs. the hash saves 32-8 (24) bytes per input, and more (27) when tx slabs are converted to 5 byte storage. This is a material storage savings and is even less costly (faster) to write, which improves net sync and performance (including full block validation).
TODO: measure total number of inputs to determine actual space savings.
Whenever a tx is read for internal computations the link provides a materially faster lookup of the actual tx than the hash (which must pass through the hash table, including any conflicts). The only exception is the case where the objective is to obtain the hash of the point. In this case the link must be traversed to obtain the hash. Apart from paging costs the only additional material cost for this is the read of 8 more bytes (for the link). This would only impact queries (p2p/client-server) for blocks/txs (and possibly others that require the prevout hash).
The reduced net storage size may offset much of the additional paging cost, as the tx table would be significantly reduced in size. Also the paging is within the same table (self-referential), so the paging cost would be insignificant for recent prevouts (the more common scenario).
Given that tx input point links are self-referential the additional reads can be deferred into the tx result and input point iterator. As a result the abstraction remains clean while allowing the hashes to be populated as necessary.
This same principle can be applied to the header.parent hash. The link is self-referential, and the parent is always looked up as part of validation. This would reduce each header index by about 1/3, and is the last remaining hash reference in the store apart from tx inpoints (above).
The above tx inpoint optimization precludes parallel block download, so :<.