openvstorage/alba

excessive alba maintenance memory usage

Opened this issue · 7 comments

domsj commented

saw it go up to 13GB (when it had lots of work to do - 30k work items).
It would be nice if the user could express the max amount of memory maintenance is allowed to use.
Customers want to limit this, so their node doesn't go into swap unexpectedly.

The maintenance can now run a special machine so this has become less hot.

We had the same issue at GIG and also on OVH right now. On OVH we have implemented a crontab to restart the maintenance agent every hour due to this excessive memory usage.

domsj commented

This will be fixed / significantly improved by https://github.com/openvstorage/alba_ee/pull/93

Still an issue

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                       
28003 root      20   0 30.098g 0.026t  20528 S  20.4 14.1   4419:02 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/cf83e8c7-9225-47ac-acd6-362ff519374c/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
28035 root      20   0 21.338g 0.019t  20752 S  13.0 10.2   3559:12 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/99a443f7-304d-4078-8158-8c0631e963f6/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
 6123 root      20   0 7868220 6.825g  21408 S  25.9  3.6   3647:25 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/e74c1ed4-9f71-4514-8436-8d9c005c9450/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
21243 root      20   0 4785828 4.136g  21184 S   7.4  2.2 164:59.10 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/804d3955-d261-4667-9c6b-a819483169d5/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
28025 root      20   0 3677188 3.020g  20868 R  25.9  1.6   5954:57 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/98e73917-0476-4995-b4db-96e29ffbe2a5/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
28005 root      20   0 6108856 2.921g  20736 S  14.8  1.5   4184:05 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/f9599945-f1f5-44d3-8497-0c462ede4ef9/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
28019 root      20   0 3637700 2.181g  20744 S  11.1  1.2   3683:34 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/56f58646-419d-4236-a868-e3b79ac8784d/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 
25120 root      20   0  972444 553640  21408 S   5.6  0.3  24:42.07 /usr/bin/alba maintenance --config arakoon://config/ovs/alba/backends/ffeb4668-5cd6-4a54-8b8e-718b3ad97a45/maintenance/config?ini=%2Fopt%2Fasd-manager%2Fcon+ 

During a test on our pocops environment, the updated Alba with async enabled doesn't show improvements regarding memory usage. In fact, memory usage has even increased (by 50%) as you can see in the graph below.
image
On the left side you can see the memory before the update, the right side is memory usage after the update.
Async Asd was set in the service file:
Environment=ALBA_USE_ASYNC_ASD_PROTOCOL=yes
Alba version is 'ee-1.6.1'.

@domsj what is the status of this, do we still experience this issue?

Yes, we still have the issue.