jupyterlab/jupyter-collaboration

Automatic file save strategies

Opened this issue · 4 comments

In #241, there is some discussion happening about what strategy we should use for automatically saving the document. I think it's worth opening an issue here to discuss further.

There are two strategies currently mentioned:

  1. Trigger a save after each user action/activity + a document_save_delay (configurable by the user).
  2. Save on a regular interval (configurable by the user).

(1) is used today and follows the same pattern as e.g. Google docs and may match better user's expectations of RTC apps. But it comes with some downsides. First, during a period of heavy activity, the document may not be saved to disk for awhile, since each previous save task was cancelled by a previous task. Second, if changes come in at a frequency just longer than the document_save_delay, the document may be saved more often than desired.

(2) is simpler to implement. The downside is that it is rather inflexible and won't tune itself to number of edits coming in from the multiple users at once.

Thoughts?

Personally, I'm leaning towards what @davidbrochart said in #241.

We should trigger on user activity over regular interval, since this is an expectation of RTC apps for users.

That said, I think we can achieve something in the middle of (1) and (2).

Instead of alway applying the save_delay , do this:

since_last_save = current_time - last_save

if (since_last_save > document_save_delay):
    # save immediately
    ...
else: 
    # save after (document_save_delay - since_last_save)
    ...

This would achieve (1) without the fear of your first downside, not saving enough. Then, we give the consumers of this library to ability to tune the document_save_delay argument to avoid the second downside based on their own system demands.

Your suggestion means that we would sometimes save, knowing that we might save again 1 second later (by default), which looks like it's not needed. You might say that if a change occurs every 0.9 seconds then we will never save, but on the other hand if users constantly change a document it really means that it's not worth saving yet. There should be no fear of loosing data, it is there in the backend and in the YStore database. The only risk of not saving to disk is if the server goes down, but this is not different from the non-RTC scenario.

I guess the "dial" I want to be able to control—from the perspective of user of this feature—is a "minimum save rate" parameter, not a save delay.

Instead of always saving after delay X, try to save immediately unless the time since the last save was < "minimum save rate"; otherwise, wait until we hit that save rate time.

Minimum save rate feels like a more intuitive configurable than "save delay", which could keep punting the "save" time later and later.

I think @Zsailer 's proposal is a reasonable tradeoff. @dlqqq and I have also been talking about advantage of not saving the YStore to disk (at least in some sitiuations), so I don't want to rely too much on that side of things to prevent data loss. I also think the mental model that @Zsailer is proposing of a "minimum save rate" is is nice way to think about the guarantee we are giving to the user.