dymensionxyz/dymint

dymint says it submit batches but it actually doesn't

Closed this issue · 2 comments

Didn't try to reproduce.

log attached.
the suspicious error (from the logs):

time="Sep 14 09:53:58.760" level=error msg="ErrGroup goroutine.[err create and submit batch: submit batch: sl client submit batch: start height: 12: end height: 14: subscription cancelled]" module=block_manager

Was ran with roller.
Uploading log_no_submission.log…

The first few state updates were submitted , and than it stopped.

This is the last line which we acutally manage to submit a batch (i.e batch accepted wasn't logged after):

time="Sep 14 09:53:36.801" level=info msg="Batch accepted.[startHeight 10 endHeight 11 stateIndex 5 dapath celestia|2702436|9|4|d4291abfd0c0a42448c08e6daf0c29e336bb41d1f75dbea72d0251f62f06b522|00000000000000000000000000000000000000990feb778b50a7fd724b|7af5eaa1efeb3ba7c5ab705464a1734900fe61c6f19125c341838967859a770c]" module=settlement_client

After this, and before submitting next batch, the service stopped:

time="Sep 14 09:53:56.901" level=info msg="Submitted batch to DA.[start height 12 end height 14]" module=block_manager
time="Sep 14 09:53:56.961" level=info msg="Broadcasted batch[txHash CBD8D839976F3D15AE1A6F6B054DF84A7D6A380AF9CD0A44030670969A01D7D0]" module=settlement_client
time="Sep 14 09:53:58.760" level=info msg="service stop[msg Stopping Node service impl Node]"
srene commented

there are two problems here:
1- batch 12-14 was actually submitted but dymint did not see its inclusion on the hub because it was restarted before receiving confirmation. on restart tries to submit a new batch (12-18) but the hub returns error because height 12-14 was already submitted.
2- there is an error in the logging when batch already submitted error, because it logs the error but it returns nil error. then the submission process sees no error and logs "Submitted batch to SL", while it didnt submit anything.

srene commented

the problem seems to come from the fact that the restart is faster than the batch inclusion on the hub. once the sequencer starts, it syncs from the hub, but the last height submitted is still the old one (11). after that, it receives a new batch event, but only full-nodes are subscribed to those events for syncing, so the sequencer is not updating the last height submitted and all batches are rejected because wrong height.