matrix-org/sliding-sync

Dropping the database can drop to-device messages going forwards

Closed this issue · 0 comments

Ideally the database should not be dropped, but there may be circumstances where a server admin wants to do this. Currently, we do not provide guidelines on what should be kept/not, beyond "keep to-device events and device data or else E2EE breaks". This has been informal and not written down anywhere.

Unfortunately, after debugging another cause of UTDs, it has come to my attention that this is insufficient. It is critical that the syncv3_to_device_messages_seq sequence IS NOT DROPPED. If it is, it will cause to-device events to be dropped and not delivered to the client, causing unable-to-decrypt messages (UTDs). The reasoning for this is as follows:

  • The sequence encodes a stream of to-device events
  • When the client asks for to-device events, the proxy delivers all to-device events > the client-provided position. This position defaults to 0.
  • If the sequence gets dropped, the positions reset to 0, but the client is still using higher positions.
  • This means no new to-device events are delivered (as the new events are <= the client-provided position).
  • Worst of all, it deletes new to-device events because acknowledged to-device events delete all events <= the client-provided position.

This will persistently cause UTDs until the sequence has caught up to the client provided position.