openvstorage/alba

spinning connection retries?

Opened this issue · 3 comments

rejecting connection to an asd causes large performance drop and proxylog is filled with following messages:

2017-08-09 11:24:11 125571 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 260 - info - closing (fd:26)
2017-08-09 11:24:11 125651 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 261 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 125689 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 262 - warning - could not receive enough fragments for namespace 1, object "tlog_abd0ecaa-b09e-40b1-82e7-af4441ba44dc" ("\142l\2477\180w\003&,D\242\157Z;@B|f\180_\206\166\240{\135`\t\166\149POt") chunk 0; got 0 while 4 needed
2017-08-09 11:24:11 125718 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 263 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 23, characters 25-32; Called from file "src/alba_client_download.ml", line 453, characters 8-48; Called from file "src/core/lwt.ml", line 670, characters 66-69
2017-08-09 11:24:11 125904 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 264 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 126093 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 265 - info - closing (fd:26)
2017-08-09 11:24:11 126252 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 266 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 126405 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 267 - info - Detected missing fragment namespace_id=1 object_name="tlog_abd0ecaa-b09e-40b1-82e7-af4441ba44dc" object_id="\142l\2477\180w\003&,D\242\157Z;@B|f\180_\206\166\240{\135`\t\166\149POt" location=((7L, 0)) (chunk,fragment,version)=(0,1)
2017-08-09 11:24:11 166627 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 268 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 166725 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 269 - info - closing (fd:26)
2017-08-09 11:24:11 166810 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 270 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 166835 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 271 - warning - could not receive enough fragments for namespace 1, object "tlog_ef460c4d-7eb8-42fd-a09a-783a22e302e9" ("\129x\150\b\175\211f\020\201\200fz|r\167\242M\192\021:Ne\245\128>BDn\137!\231#") chunk 0; got 0 while 4 needed
2017-08-09 11:24:11 166854 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 272 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 23, characters 25-32; Called from file "src/alba_client_download.ml", line 453, characters 8-48; Called from file "src/core/lwt.ml", line 670, characters 66-69
2017-08-09 11:24:11 167013 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 273 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 167144 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 274 - info - closing (fd:26)
2017-08-09 11:24:11 167285 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 275 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 167403 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 276 - info - Detected missing fragment namespace_id=1 object_name="tlog_ef460c4d-7eb8-42fd-a09a-783a22e302e9" object_id="\129x\150\b\175\211f\020\201\200fz|r\167\242M\192\021:Ne\245\128>BDn\137!\231#" location=((7L, 0)) (chunk,fragment,version)=(0,2)
2017-08-09 11:24:11 219140 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 277 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 219363 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 278 - info - closing (fd:26)
2017-08-09 11:24:11 219439 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 279 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 219464 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 280 - warning - could not receive enough fragments for namespace 1, object "tlog_734253a9-4bae-4aea-9e75-a94b8226eefc" ("\233'\199\155\019g?+\222\030\146 l\014:\027\255A\200r\"\248\156\184VP\155\169\r\187\004\168") chunk 0; got 0 while 4 needed
2017-08-09 11:24:11 219482 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 281 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 23, characters 25-32; Called from file "src/alba_client_download.ml", line 453, characters 8-48; Called from file "src/core/lwt.ml", line 670, characters 66-69
2017-08-09 11:24:11 219675 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 282 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 219798 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 283 - info - closing (fd:26)
2017-08-09 11:24:11 219953 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 284 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 220132 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 285 - info - Detected missing fragment namespace_id=1 object_name="tlog_734253a9-4bae-4aea-9e75-a94b8226eefc" object_id="\233'\199\155\019g?+\222\030\146 l\014:\027\255A\200r\"\248\156\184VP\155\169\r\187\004\168" location=((7L, 0)) (chunk,fragment,version)=(0,0)
2017-08-09 11:24:11 440521 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 286 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 440726 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 287 - info - closing (fd:26)
2017-08-09 11:24:11 440860 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 288 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 440907 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 289 - warning - could not receive enough fragments for namespace 1, object "tlog_88dae73a-5f55-494e-bc6a-876f10d94cba" ("6#\174\019\230]K\138\233\170\156\026\135l\021-to\199\209\142(U%N\196\129U\172\134\202\020") chunk 0; got 0 while 4 needed
2017-08-09 11:24:11 440942 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 290 - info - retrying : Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "src/alba_client_errors.ml", line 23, characters 25-32; Called from file "src/alba_client_download.ml", line 453, characters 8-48; Called from file "src/core/lwt.ml", line 670, characters 66-69
2017-08-09 11:24:11 441406 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 291 - info - connect_with : 172.22.186.33 8603 None Net_fd.TCP (fd:26)
2017-08-09 11:24:11 441668 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 292 - info - closing (fd:26)
2017-08-09 11:24:11 441934 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 293 - info - "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
2017-08-09 11:24:11 442190 +0200 - ftcmp03 - 2933/0000 - alba/proxy - 294 - info - Detected missing fragment namespace_id=1 object_name="tlog_88dae73a-5f55-494e-bc6a-876f10d94cba" object_id="6#\174\019\230]K\138\233\170\156\026\135l\021-to\199\209\142(U%N\196\129U\172\134\202\020" location=((7L, 0)) (chunk,fragment,version)=(0,0)

Performance drop might be due to spinning retries.

iptables -A INPUT -p tcp --destination-port 8603 -j REJECT

which alba version is this?

# alba version --terse
ee-1.5.10

(Jan asked to create ticket here)

domsj commented

Apparently the behavior is the same for an asd that is down as for one that is blocked by iptables.
To avoid these useless connection attempts towards an asd that is down we would need the disqualification concept for reads too.
(Currently it's a bit too write specific, as a full asd is rightfully rejected for new writes, but should still be used for reading.)