openvstorage/alba

High cpu load on asds when deleting 1000 vdisks

Closed this issue · 3 comments

On the OVH setup we had a strange case.

The CPU of the most StorageVsa nodes (where the accel asd backend is located) was high (load 10+ on a 2 vcpu vm).

At tho moment we deleted 1000 vdisks the most asd from the accel backend are disqualified from the proxy.

In the ASD log file we checked the request apply took ....
The most request were applied < 1 sec but some times > 20 sec.

For example StorageVSA08:
CPU load:
cpu_load

Summary disk IO:
disk_io

Asd usage:
asd_usage

proxy logging:

Feb 20 23:58:40 ovs01 alba[49262]: 2017-02-20 23:58:40 461434 +0100 - ovs01 - 49262/0 - alba/proxy - 692237 - info - server: new client connection from (172.20.20.1,43290)
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 374023 +0100 - ovs01 - 49262/0 - alba/proxy - 692238 - info - Disqualifying osd 31: (Unix.Unix_error "Broken pipe" write ""); backtrace:; Raised at file "format.ml", line 241, charact
ers 41-52; Called from file "format.ml", line 482, characters 6-24
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 374112 +0100 - ovs01 - 49262/0 - alba/proxy - 692239 - warning - could not receive enough fragments for namespace 22805, object " \000\000\000m#\195e\211%\189\159i\018\166\255E\217\23
9\246\163\r\138\b;\164\217ot\160\140[I+T\172\000\000\000\000\001\000\000\000" ("2\177\023\b\221\022\143\002\167\t\252\b\170\208\255\nH`;{\214\176\210\b\211\196\189f\209\145+X") chunk 0; got 0 while 4 needed
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 374169 +0100 - ovs01 - 49262/0 - alba/proxy - 692240 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 34, characters 2
5-32; Called from file "src/alba_client_download.ml", line 385, characters 6-45; Called from file "src/core/lwt.ml", line 653, characters 66-69
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 374269 +0100 - ovs01 - 49262/0 - alba/proxy - 692241 - warning - could not receive enough fragments for namespace 22805, object " \000\000\000m#\195e\211%\189\159i\018\166\255E\217\23
9\246\163\r\138\b;\164\217ot\160\140[I+T\172\000\000\000\000\003\000\000\000" ("\216\252\223\207\000r\130\020}\155}\229\142\172\021l\147\218\152\236\143=\028\253\021\240VB\220g\154y") chunk 0; got 0 while 4 needed
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 374300 +0100 - ovs01 - 49262/0 - alba/proxy - 692242 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 34, characters 2
5-32; Called from file "src/alba_client_download.ml", line 385, characters 6-45; Called from file "src/core/lwt.ml", line 653, characters 66-69
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 381975 +0100 - ovs01 - 49262/0 - alba/proxy - 692243 - info - Disqualifying osd 25: (Unix.Unix_error "Broken pipe" write ""); backtrace:; Raised at file "format.ml", line 241, characters 41-52; Called from file "format.ml", line 482, characters 6-24
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 382052 +0100 - ovs01 - 49262/0 - alba/proxy - 692244 - warning - could not receive enough fragments for namespace 22805, object " \000\000\000m#\195e\211%\189\159i\018\166\255E\217\239\246\163\r\138\b;\164\217ot\160\140[I+T\172\000\000\000\000\006\000\000\000" ("\007\217\004/\"6\213\000b\158\139\222&@-s\t\192\167\150\"+aSb\215\147h>\173\233\156") chunk 0; got 2 while 4 needed
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 382086 +0100 - ovs01 - 49262/0 - alba/proxy - 692245 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 34, characters 25-32; Called from file "src/alba_client_download.ml", line 385, characters 6-45; Called from file "src/core/lwt.ml", line 653, characters 66-69
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 382287 +0100 - ovs01 - 49262/0 - alba/proxy - 692246 - info - Disqualifying osd 26: End_of_file; backtrace:; Raised at file "bytes.ml", line 219, characters 25-34; Called from file "src/pre_sexp.ml", line 86, characters 11-45
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 382652 +0100 - ovs01 - 49262/0 - alba/proxy - 692247 - warning - could not receive enough fragments for namespace 1350, object " \000\000\000\219\219\209\216\228\214c\127:\182\167\167\202\001\2252\027.G4?\134a\185$\197\141\n\025\015\141\134\000\000\000\000\005\000\000\000" ("\004\143\001\1427\016\190\155E`\148_\020\224\128\188x\150\180U\250#\162\025\235\006\248\179\181\171q\196") chunk 0; got 0 while 4 needed
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 382690 +0100 - ovs01 - 49262/0 - alba/proxy - 692248 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 34, characters 25-32; Called from file "src/alba_client_download.ml", line 385, characters 6-45; Called from file "src/core/lwt.ml", line 653, characters 66-69
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 385120 +0100 - ovs01 - 49262/0 - alba/proxy - 692249 - info - Could not yet requalify osd 26, trying again in 1.000000 seconds: (Unix.Unix_error "Connection refused" connect ""); backtrace:; Raised at file "format.ml", line 241, characters 41-52; Called from file "format.ml", line 482, characters 6-24
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 404129 +0100 - ovs01 - 49262/0 - alba/proxy - 692250 - info - Requalified osd 31
Feb 20 23:58:43 ovs01 alba[49262]: 2017-02-20 23:58:43 762394 +0100 - ovs01 - 49262/0 - alba/proxy - 692251 - info - Requalified osd 25
Feb 20 23:58:44 ovs01 alba[49262]: 2017-02-20 23:58:44 340682 +0100 - ovs01 - 49262/0 - alba/proxy - 692252 - info - Could not yet requalify osd 26, trying again in 1.500000 seconds: (Unix.Unix_error "Connection refused" connect ""); backtrace:; Raised at file "format.ml", line 241, characters 41-52; Called from file "format.ml", line 482, characters 6-24

asd logging:

Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 446563 +0100 - storagevsa08 - 1486/0 - alba/asd - 448419 - info - Request Apply took 0.767189
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 446866 +0100 - storagevsa08 - 1486/0 - alba/asd - 448420 - info - Request Apply took 0.730920
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 448474 +0100 - storagevsa08 - 1486/0 - alba/asd - 448421 - info - Request Apply took 0.512208
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 896676 +0100 - storagevsa08 - 1486/0 - alba/asd - 448422 - info - Request Apply took 0.706091
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 899365 +0100 - storagevsa08 - 1486/0 - alba/asd - 448423 - info - Request Apply took 0.698886
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:00 899826 +0100 - storagevsa08 - 1486/0 - alba/asd - 448424 - info - Request Apply took 0.638166
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:01 447375 +0100 - storagevsa08 - 1486/0 - alba/asd - 448425 - info - Request Apply took 0.917083
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:01 447665 +0100 - storagevsa08 - 1486/0 - alba/asd - 448426 - info - Request Apply took 0.883411
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:01 692244 +0100 - storagevsa08 - 1486/0 - alba/asd - 448427 - info - Request Apply took 0.639645
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:01 692567 +0100 - storagevsa08 - 1486/0 - alba/asd - 448428 - info - Request Apply took 0.578368
Feb 20 23:58:04 storagevsa08 alba[1486]: 2017-02-20 23:58:01 692850 +0100 - storagevsa08 - 1486/0 - alba/asd - 448429 - info - Request Apply took 0.503718
Feb 20 23:58:06 storagevsa08 alba[1486]: 2017-02-20 23:58:06 705855 +0100 - storagevsa08 - 1486/0 - alba/asd - 448430 - info - Request Apply took 3.329500
Feb 20 23:58:06 storagevsa08 alba[1486]: 2017-02-20 23:58:06 711057 +0100 - storagevsa08 - 1486/0 - alba/asd - 448431 - info - server: new client connection from (172.20.20.3,50148)
Feb 20 23:58:06 storagevsa08 alba[1486]: 2017-02-20 23:58:06 902028 +0100 - storagevsa08 - 1486/0 - alba/asd - 448432 - info - Request Apply took 3.492746
Feb 20 23:58:06 storagevsa08 alba[1486]: 2017-02-20 23:58:06 908039 +0100 - storagevsa08 - 1486/0 - alba/asd - 448433 - info - Request Apply took 3.421134
Feb 20 23:58:06 storagevsa08 alba[1486]: 2017-02-20 23:58:06 908954 +0100 - storagevsa08 - 1486/0 - alba/asd - 448434 - info - Request Apply took 3.413975
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:06 911001 +0100 - storagevsa08 - 1486/0 - alba/asd - 448435 - info - Request Apply took 30.712333
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:06 911442 +0100 - storagevsa08 - 1486/0 - alba/asd - 448436 - info - Request Apply took 15.590540
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:06 912037 +0100 - storagevsa08 - 1486/0 - alba/asd - 448437 - info - Request Apply took 10.055910
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 153686 +0100 - storagevsa08 - 1486/0 - alba/asd - 448438 - info - Request Apply took 26.527847
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 154053 +0100 - storagevsa08 - 1486/0 - alba/asd - 448439 - info - Request Apply took 16.016686
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 154499 +0100 - storagevsa08 - 1486/0 - alba/asd - 448440 - info - Request Apply took 5.303531
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 579602 +0100 - storagevsa08 - 1486/0 - alba/asd - 448441 - info - Request Apply took 0.735050
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 594964 +0100 - storagevsa08 - 1486/0 - alba/asd - 448442 - info - Request Apply took 0.623358
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 953551 +0100 - storagevsa08 - 1486/0 - alba/asd - 448443 - info - Request Apply took 0.589259
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 973181 +0100 - storagevsa08 - 1486/0 - alba/asd - 448444 - info - Request Apply took 0.814359
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 973464 +0100 - storagevsa08 - 1486/0 - alba/asd - 448445 - info - Request Apply took 0.770293
Feb 20 23:58:07 storagevsa08 alba[1486]: 2017-02-20 23:58:07 997538 +0100 - storagevsa08 - 1486/0 - alba/asd - 448446 - info - Request Apply took 0.759556
Feb 20 23:58:08 storagevsa08 alba[1486]: 2017-02-20 23:58:07 997858 +0100 - storagevsa08 - 1486/0 - alba/asd - 448447 - info - Request Apply took 0.614799
Feb 20 23:58:08 storagevsa08 alba[1486]: 2017-02-20 23:58:07 998228 +0100 - storagevsa08 - 1486/0 - alba/asd - 448448 - info - Request Apply took 0.585061
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 801087 +0100 - storagevsa08 - 1486/0 - alba/asd - 448449 - info - Request Apply took 1.171140
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 801865 +0100 - storagevsa08 - 1486/0 - alba/asd - 448450 - info - Request Apply took 1.156205
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 803226 +0100 - storagevsa08 - 1486/0 - alba/asd - 448451 - info - Request Apply took 1.121276
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 803855 +0100 - storagevsa08 - 1486/0 - alba/asd - 448452 - info - Request Apply took 1.070673
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 804261 +0100 - storagevsa08 - 1486/0 - alba/asd - 448453 - info - Request Apply took 1.033457
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:08 804569 +0100 - storagevsa08 - 1486/0 - alba/asd - 448454 - info - Request Apply took 0.999246
Feb 20 23:58:09 storagevsa08 alba[1486]: 2017-02-20 23:58:09 422948 +0100 - storagevsa08 - 1486/0 - alba/asd - 448455 - info - Request Apply took 1.491454

@domsj anything needs to happen? If fixed please close.

After discussion with @domsj , some fixes are done, to be verified by QA

Closed by PM.