datto/dattobd

When snapshot sda2, does sda2 need to change its mrf if it's already changed?

nickchen-cpu opened this issue · 3 comments

Hi

Disk conf: sda1, sda2
Steps:

  1. First snapshot sda1;
  2. In __tracer_transition_tracing , sda1->sda->queue->make_request is replaced with tracing_mrf, and so is sda2->sda->queue->make_request;
  3. Snapshot the sda2, here's the point: We still run __tracer_transition_tracing for sda2, which will again:
    (1) Freeze the sda;
    (2) Change sda2->sda->queue->make_request to tracing_mrf. but it already is in step 2.

Could we do some precheck to prevent this?

Best Regards,
Nick

The double assignment isn't a problem. However, when you'd go to destroy those devices we'd certainly have a problem. When we snapshot sda2, we record the original mrf so that we can restore it later; this causes the problem since at this point sda2 original will have been tracing_mrf. If the destroys come in the same sequence as the snapshots, then sda will end up with tracing_mrf as its callback after there are no snap devices associated with it. In summary, the following would happen:

  1. snapshot sda1 - orig_mrf is recorded normally, tracing_mrfis swapped in
  2. snapshot sda2 - orig_mrf is set to tracing_mrf, tracing_mrf is swapped in
  3. destroy sda1 - orig_mrf (which is the actual fn pointer) is swapped back into sda->queue->make_request
  4. destroy sda2 - orig_mrf (which is tracing_mrf is swapped back into sda->queue->make_request

And this small problem gets larger if the module is unloaded!

Good catch! Initial idea for resolving would be a kind of reference count for mrf swaps. We could create a couple of functions gendisk_mrf_get and gendisk_mrf_put and a map that associate gendisk* and orig_mrfs along with how many times they've been accessed.

snapshot sda1 - orig_mrf is recorded normally, tracing_mrfis swapped in
snapshot sda2 - orig_mrf is set to tracing_mrf, tracing_mrf is swapped in
destroy sda1 - orig_mrf (which is the actual fn pointer) is swapped back into sda->queue->make_request

After sda1 is out of tracing(because make_requset is assigned back to orig_mrf), the tracing function of sda2 is also disabled, because they share the same device queue, right?

Indeed. Thank you @nickchen-cpu