IntersectMBO/cardano-ledger

Support replaying with less memory

kderme opened this issue · 8 comments

When a block is reapplied, the ledger only generates the new state and doesn't validate the given block/txs. This makes the process faster than normal application, but still uses the same memory, since the result ledger state is the same. This is a desired property for the node, however for clients that only replay the chain for its staking and governance data and events, some parts of the ledger are never actually used. This includes the multiassets, payment credential and other parts of the UTxO set, like inline scripts, datums etc.

This feauture request proposes the addition of a ledger flag, which changes the reapplication ledger state result in terms of these data.

An open question is if this can be supported without major changes to ledger. My assumption is that it can. The BabbageTxOut contains a number of constructors. The flag could force the usage of only one or a couple of these constructors, while leaving fields like multiassets and stake creadential empty. A new constructor hopefully is not necessary, since it could affect the performance of the node.

To define more formally what is requested, given the existing block reapplication:

reapply:: State -> Block -> (State, Events)

We want to find a function t, that trims the UTxO set as much as possible

t :: State -> State

and define a new function

reapply':: State -> Block -> (State, Events)

such that

if
(s', e) == reapply s b
then
(t s', e) == reapply' (t s) b

Clients like db-sync will be mostly benefited from this, however it could be used to test an important property that ledger and node relies on: The ledger reapplication should work and reach a correct ledger state without relying on unecessary data. Also making it easier to create a ledger state could be used for debugging.

I am wondering if there is a real reason to pursue this approach with UTxOHD being around the corner? I assume it would solve this issue of unnecessary memory overhead, right?

An open question is if this can be supported without major changes to ledger. My assumption is that it can.

Unfortunately support for this would require significant changes. It could theoretically be done, but it would add quite a bit of complexity to the already complex rules, because we would have to track in the ledger rules all the parts that would need to not be applied to the ledger state on a specific flag. The biggest reason why I would oppose this change is that, besides ignoring certain state modifications, we would also have to perform some validations conditionally on this flag, which all together sounds too dangerous to me.

I think I can suggest an alternative approach that might work for db-sync. Namely to employ some postprocessor that would strip out the unnecessary parts from the state after applying a block. The only required trick for this to work is that all ledger validation has to be turned off, i.e reapplyBlock is not sufficient, because it still performs some validations. So applyBlock has to be called with NoValidation, cause otherwise predicate failures like ValueNotConservedUTxO will be triggered. In other words it would work like that:

  1. apply a block without validations to a stripped down state
  2. look into transactions in the block and collect redundant data that they could have added to the state
  3. remove that data from the state.

My suggestion would allow for ledger not to worry about this optimization that is not relevant for the chain safety and performance and it would allow for the team to stay sane working on those rules.

The ledger reapplication should work and reach a correct ledger state without relying on unecessary data. Also making it easier to create a ledger state could be used for debugging.

This use case is not accurate at all. We do need all data in order to reach the correct state. It is just that db-sync's view differs from the one that node and ledger have about what correct state actually means 🙂

There is plenty of data on chain that is not relevant for ledger, eg aux data, anchors, etc. Almost none of it is stored in the ledger state precisely because it is not relevant for ledger.

@kderme I am closing this ticket as won't fix. Please feel free to reopen it if this is still something you'd like pursue. We'd have to come up with a different approach, however, if this is a problem for db-sync.

Reopening this to ask some clarifications

Namely to employ some postprocessor that would strip out the unnecessary parts from the state after applying a block.

We could try to implement it on the db-sync side, by doing some preprocessing instead as @sgillespie mentioned: we could adjust the block by trimming unnecessary parts before applying it. This would give us the same result without multiasset, payment credential etc. I'm mostly concerned about the correctness of this approach.

The only required trick for this to work is that all ledger validation has to be turned off, i.e reapplyBlock is not sufficient, because it still performs some validations. So applyBlock has to be called with NoValidation, cause otherwise predicate failures like ValueNotConservedUTxO will be triggered.

I think this is not actually an issuem since reapplyBlock does no validation already (note that reapplyTx does some validaiton but this is different).

we could adjust the block by trimming unnecessary parts before applying it

That would work even better then if we were to change ledger rules or do some sort of postprocessing. I see no issues with such approach.

reapplyBlock does no validation already

Yes reapplyBlock does no validation. I was indeed thinking of repplyTx, when making this comment

Great, we just need to validate some properties then:

  • we can construct txs whose memo bytes and hash doesn't match the raw body, possible with an "unsafe" constructor
  • we can construct blocks from txs that have not been validated in the mempool. I think this is already possible through unsafeMakeValidated and mkShelleyValidatedTx from consensus.
  • Having a TxBodyRaw which doesn't correspond to the body bytes doesn't confuse the ledger rules, eg the hash of the original bytes are used to create the UTxO and there is no rehashing of the raw body.

@sgillespie and I will try to validate and complete this list of properties

we can construct txs whose memo bytes and hash doesn't match the raw body, possible with an "unsafe" constructor

As far as I can tell, there's no way to do this without recalculating the MemoBytes. The AlonzoTxBody constructor is only exposed by a pattern which recalculates the MemoBytes. The lenses won't help either, since they also call mkMemoBytes.

we can construct blocks from txs that have not been validated in the mempool

I don't see how the Mempool or Validated plays in here. I should be able to create a block using its exported constructor. I cannot, however, construct an AlonzoTxSeq without re-encoding

lehins commented

Yes, inability of ledger users to control MemoBytes is the protection that we have, but it is also the thing that will prevent you form mocking with transactions and blocks.

Loosing original binary representation would not be a problem for ledger rules for the most part, since disabling validation would mostly ignore any inconsistencies likes signature validation etc, but the problem is with the TxId for the UTxO and Governance, it would totally mess up identification of outputs and governance actions.

I am a little skeptical about allowing changes to the transaction without affecting the binary representation, but maybe we could export some sort of dangerous internal modifiers that would allow for advanced users like yourselves to shoot themselves in the foot 😄