openvstorage/alba

maintenance keeps on trying and failing to repair an unavailable osd (from a cache backend with 1,0,1,1 policy)

Closed this issue · 0 comments

domsj commented
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 491899 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43347 - info - "(Unix.Unix_error \"No route to host\" connect \"\")" was unforeseen, invalidat
ing pool
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 491907 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43348 - info - "(Unix.Unix_error \"No route to host\" connect \"\")": should_invalidate:true s
hould_retry:false
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 491921 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43349 - info - "(Unix.Unix_error \"No route to host\" connect \"\")" was unforeseen, invalidat
ing pool
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 491930 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43350 - info - "(Unix.Unix_error \"No route to host\" connect \"\")": should_invalidate:true s
hould_retry:false
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 494067 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43351 - warning - could not receive enough fragments for namespace 563818, object " \000\000\0
00\005\220\128\171\206\179*\245\228\146\131\142\174\151\241\tW97\191\168eJ\002\193\184\rI\233o]\000\000\000\000\000\007\000\000\000" ("\239\216\030\239\0181\221\152f\164J\233^Go=*\216j\015\228\156$\025Pc\194\249\
183\207\255(") chunk 0; got 0 while 1 needed
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 494691 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43352 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:553)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 494735 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43353 - info - ignoring: Alba_client_errors.Error.Exn(8)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 494889 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43354 - warning - could not receive enough fragments for namespace 796, object " \000\000\000\
221\011]\194\204\163YM\234&\241^\n\207h\170\004:h\246\202\178\031\143\004\247'\152\237\237\031\242\000\000\000\000\000\000\000\000" ("7\193\153[\1846\141K\202f\147+\23786\029t\219&\164\222\225\170]\173xK\212\247F
\001\232") chunk 0; got 0 while 1 needed
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 495251 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43355 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:554)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 495349 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43356 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:555)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 495443 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43357 - warning - could not receive enough fragments for namespace 573194, object " \000\000\0009.\128\142\221N\175\026\011\241]H\194\171H\167\220e\016\183\145}\236@\137\131\244\216\181\175\221=\005\000\000\000block\000\000\128\001\000\000\004\000" ("/\139\026O\226\028A\193\227j\018,Y\249U\195\178\166H\"5\012\244v\243\r>I\025\235\240\139") chunk 0; got 0 while 1 needed
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 495740 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43358 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:556)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 495759 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43359 - info - ignoring: Alba_client_errors.Error.Exn(8)
Aug 09 04:43:50 NY1SRV0002 alba[7254]: 2017-08-09 04:43:50 496220 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43360 - warning - could not receive enough fragments for namespace 564034, object " \000\000\000\243\175\254q\169\200\1292\239M\022\237\192\029\1820\173cjE\225=/\162\176$VX\2275\194\158\000\000\000\000\004\000\000\000" ("\178\1766\132\145o<y\128\133\153\006H\190\148p\228\139\143i 3\239x;\130g\148>r\026\021") chunk 0; got 0 while 1 needed

...

Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 491972 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43382 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492007 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43383 - info - "Networking2.ConnectTimeout" was unforeseen, invalidating pool
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492021 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43384 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492053 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43385 - info - "Networking2.ConnectTimeout" was unforeseen, invalidating pool
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492062 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43386 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492088 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43387 - info - "Networking2.ConnectTimeout" was unforeseen, invalidating pool
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492096 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43388 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492127 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43389 - info - "Networking2.ConnectTimeout" was unforeseen, invalidating pool
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 492138 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43390 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 494299 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43391 - info - closing (fd:553)
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 494540 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43392 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:547)
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 494560 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43393 - info - "Networking2.ConnectTimeout" was unforeseen, invalidating pool
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 494572 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43394 - info - "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 494953 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43395 - info - connect_with : 172.17.0.98 8601 None Net_fd.TCP (fd:548)
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 496158 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43396 - info - closing (fd:554)
Aug 09 04:43:51 NY1SRV0002 alba[7254]: 2017-08-09 04:43:51 496186 -0400 - NY1SRV0002 - 7254/0000 - alba/maintenance - 43397 - info - closing (fd:556)

It would be better if maintenance didn't waste so much effort, perhaps it could just delete those objects, as suggested earlier in #437