openvstorage/alba

optimize maintenance rebalance/re-replicate with direct asd-to-asd communication

Opened this issue · 5 comments

domsj commented

Rebalance can be optimized by having the too full asd send the fragment data to the not-yet-full-enough asd directly.

Similarly for repair in case of a replication policy it should be possible to send the fragment data directly between the asds.

Quite some impact from maintenance trying to rebalance an asymmetric backend (1 disk extra in 1 node in this setup; but extra empty disks/nodes would have simular behaviour)

Throughput from dd in a vm via edge:

maintenance off:

...
1494507344 5 16.3308 s 131 MB/s
1494507361 6 17.5935 s 122 MB/s
1494507380 7 18.4305 s 117 MB/s

maintenance on:

1494507400 8 27.6425 s 77.7 MB/s
1494507428 9 50.8574 s 42.2 MB/s
1494507480 10 32.9535 s 65.2 MB/s
1494507514 11 44.0469 s 48.8 MB/s

maintenance off:

1494507561 12 17.793 s 121 MB/s
1494507580 13 17.5795 s 122 MB/s
1494507599 14 18.3454 s 117 MB/s
1494507618 15 17.8269 s 120 MB/s
1494507637 16 19.3551 s 111 MB/s

maintenance on:

1494507657 17 49.1189 s 43.7 MB/s
1494507709 18 42.4096 s 50.6 MB/s
1494507753 19 41.4204 s 51.8 MB/s
1494507797 20 33.4441 s 64.2 MB/s

Network without maintenance:

       eth2       
 KB/s in  KB/s out
147599.3  54549.87
189334.6  54740.97
199151.6  54746.13
167426.1  16094.95
222470.0  39147.36
219685.5  54783.22
206448.5  54979.32
330253.6  55099.64
185136.3  37519.75

network with maintenance:

       eth2       
 KB/s in  KB/s out
409364.7  505545.1
433639.6  513500.1
418442.1  570177.3
454566.2  498419.2
453942.3  513534.5
420513.6  435244.4
438833.4  466755.9
473411.2  526030.5
518031.7  369577.4
489400.4  430726.2

Maybe the rebalancing should not be enabled by default, given the impact on the network (and disks) that gets lost for ingest?

Is the time/work done for moving old data around indeed worth the effort? Probably this also depends on the use case and for a constant ingest things might be different than for a bursty one...

Maybe the decision when to move data around plus from where to where to move is also something that needs more thoughtful insight (policies used / capacity planning / ...) than the maintenance process itself has?

ps/ rebalancing can be turned off via

alba update-maintenance-config --disable-rebalance --config <abm-configurl>

Isn't there a way to limit the impact of rebalancing (lowering its priority) so there still is some rebalancing going on?

waiting on QA effort.