openvstorage/alba

albaproxy throws masterlookupresult errors when all the arakoons are still running

Closed this issue · 2 comments

When running a heavy fio test on the environment the healthcheck and the alba proxy throws an (info) error log.

All the Arakoons are still running without errors but what we saw in the arakoon logs is appending the tlog took + 6 sec for 259 B.

Same masterlookup error on the healthcheck: openvstorage/openvstorage-health-check#279

Proxy log:

Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 64027 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208339 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 66420 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208340 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 66495 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208341 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 67708 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208342 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 73785 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208343 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 77481 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208344 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 83058 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208345 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 85153 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208346 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 86638 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208347 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 86719 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208348 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 89460 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208349 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 93768 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208350 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); back
trace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 173367 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208351 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); bac
ktrace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46
Feb 09 10:01:59 stor-01.be-g8-4 alba[31635]: 2017-02-09 10:01:59 179252 +0100 - stor-01.be-g8-4 - 31635/0 - alba/proxy - 1208352 - info - Exception while adding object to alba fragment cache: Client_helper.MasterLookupResult.Error(0); bac
ktrace:; Raised at file "queue.ml", line 68, characters 17-22; Called from file "src/tools/lwt_pool2.ml", line 98, characters 25-46

Arakoon log:

Feb 09 10:01:56 stor-02.be-g8-4 arakoon[11369]: 2017-02-09 10:01:56 797656 +0100 - stor-02.be-g8-4 - 11369/0 - arakoon - 233371 - info - T:on_accept took: 5.566672 (519 B)
Feb 09 10:01:56 stor-02.be-g8-4 arakoon[11656]: 2017-02-09 10:01:56 797644 +0100 - stor-02.be-g8-4 - 11656/0 - arakoon - 13492 - info - T:on_accept took: 2.477003 (49 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[15498]: 2017-02-09 10:01:57 41009 +0100 - stor-02.be-g8-4 - 15498/0 - arakoon - 25304 - info - TCP info for 10.109.3.41:26427: { state = 1; ca_state = 0; retransmits = 0; probes = 0; backoff = 0; options = 7; snd_wscale = 2; rcv_wscale = 7; rto = 204000; ato = 0; snd_mss = 8948; rcv_mss = 536; unacked = 0; sacked = 0; lost = 0; retrans = 0; fackets = 0; last_data_sent = 4284; last_ack_sent = 0; last_data_recv = 206925056; last_ack_recv = 4284; pmtu = 9000; rcv_ssthresh = 26880; rtt = 136; rttvar = 176; snd_ssthresh = 7; snd_cwnd = 10; advmss = 8948; reordering = 7; rcv_rtt = 0; rcv_space = 26880; total_retrans = 1 }
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[15498]: 2017-02-09 10:01:57 41089 +0100 - stor-02.be-g8-4 - 15498/0 - arakoon - 25305 - info - TCP info for 10.109.3.43:26427: { state = 1; ca_state = 0; retransmits = 0; probes = 0; backoff = 0; options = 7; snd_wscale = 7; rcv_wscale = 7; rto = 204000; ato = 0; snd_mss = 8948; rcv_mss = 536; unacked = 0; sacked = 0; lost = 0; retrans = 0; fackets = 0; last_data_sent = 4284; last_ack_sent = 0; last_data_recv = 206925056; last_ack_recv = 4284; pmtu = 9000; rcv_ssthresh = 26880; rtt = 56; rttvar = 25; snd_ssthresh = 45; snd_cwnd = 10; advmss = 8948; reordering = 13; rcv_rtt = 0; rcv_space = 26880; total_retrans = 1 }
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[11656]: 2017-02-09 10:01:57 44075 +0100 - stor-02.be-g8-4 - 11656/0 - arakoon - 13493 - info - eaAiuYTKKCZQyky0 is master
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[11369]: 2017-02-09 10:01:57 84410 +0100 - stor-02.be-g8-4 - 11369/0 - arakoon - 233372 - info - eaAiuYTKKCZQyky0 is master
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[17385]: 2017-02-09 10:01:57 98792 +0100 - stor-02.be-g8-4 - 17385/0 - arakoon - 10865 - info - T:on_accept took: 2.651100 (49 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[15498]: 2017-02-09 10:01:57 98885 +0100 - stor-02.be-g8-4 - 15498/0 - arakoon - 25306 - info - T:on_accept took: 2.504548 (49 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[13859]: 2017-02-09 10:01:57 98892 +0100 - stor-02.be-g8-4 - 13859/0 - arakoon - 44947 - info - T:on_accept took: 2.649449 (49 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[12431]: 2017-02-09 10:01:57 98862 +0100 - stor-02.be-g8-4 - 12431/0 - arakoon - 236532 - info - T:on_accept took: 5.968652 (1295 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 98792 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202830 - info - T:on_accept took: 6.221415 (259 B)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 101036 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202831 - info - 9soWpxdBxc1uqqEP: election_suggest: n=66 i=171220 (Vc ([Sequence([Assert          ;"/nsm_host_version/";4; Replace            ;"/nsm/\199\003\000\000osds/deletes/\000\000\000V/p\000\000\000\000n\000\000\003\199r \000\000\000\002b\232\020}\217>\016\141fo\241\023\\_\135\149\030\209\n\149\020$\157;\147\183\244\216)\007\227\000\000\000\000\007\000\000\000\000\000\000\000";0; Replace            ;"/nsm/\199\003\000\000osds/deletes/\000\000\000V/p\000\000\000\000n\000\000\003\199o \000\000\000\002b\232\020}\217>\016\141fo\241\023\\_\135\149\030\209\n\149\020$\157;\147\183\244\216)\007\227\000\000\000\000\007\000\000\000\000\000\000\000";0]);],false)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 101083 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202832 - info - CATCHUP start: I'm @ 171220 and eaAiuYTKKCZQyky0 is more recent
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 101094 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202833 - info - catchup_tlog 171220 from eaAiuYTKKCZQyky0
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 102100 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202834 - info - loop_entries
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 118782 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202835 - info - remote_nodestream :: last_seen = Some ("171411")
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 118805 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202836 - info - loop_parts done
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 118823 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202837 - info - catchup_tlog completed
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 118831 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202838 - info - CATCHUP phase 1 done (until_i = 171411); now the store
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 118837 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202839 - info - replaying log to store
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 119104 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202840 - info - Replaying tlog file: 034.tlog [169804,...] (6/6)
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 119147 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202841 - info - Some {filename="/mnt/ssd2/arakoon/storageBackend-nsm_0/tlogs/034.tlog";mapping=(171000,4249869)::(170000,599827)} => jump to 4249869
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 360646 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202842 - info - Completed replay of 034.tlog, took 0.241498 seconds, 0 to go
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 360736 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202843 - info - catchup_store completed, store is @ Some ("171411")
Feb 09 10:01:57 stor-02.be-g8-4 arakoon[9642]: 2017-02-09 10:01:57 360749 +0100 - stor-02.be-g8-4 - 9642/0 - arakoon - 202844 - info - CATCHUP end

@domsj can you update this one?

domsj commented

I don't think there's anything to do here.
The ssd on which the arakoon was running couldn't keep up with the load, leading to periodic masterloss of the cluster.
The alba proxy (rightfully) logs this, without stumbling over it.
So closing this issue...