alba[5376]: (Unix.Unix_error "Cannot assign requested address" bind "")
Closed this issue · 2 comments
yongshengma commented
Hello ,
I have 3 nodes cluster. All 3 got the same issue: (Unix.Unix_error "Cannot assign requested address" bind "")
. Here's info from one node.
Running processes
=================
alba-maintenance_storage-KkcVdVmQcy0ut9jM active 1480
asd-manager active 1416
asd-watcher active 1413
Non-running processes
=====================
alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie activating
Logs of alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie
Jul 24 16:37:19 compute1 sh[5371]: alba: main: found env var ALBA_LWT_ENGINE=libev
Jul 24 16:37:19 compute1 systemd[1]: Started ALBA ASD.
Jul 24 16:37:19 compute1 alba[5376]: alba: main: found env var ALBA_LWT_ENGINE=libev
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 150648 +0800 - compute1 - 5376/0000 - alba/asd - 0 - info - Found the following config: {; "asd_id": "yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie", ; "log_level": "info", ; "ips": [; "192.168.6.120"; ], ; "multicast": null, ; "transport": "tcp", ; "home": "/mnt/alba-asd/zrOyeYOcJsj4tHg5/yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie", ; "capacity": 107318611968, ; "rocksdb_block_cache_size": 26200833, ; "port": 8600, ; "node_id": "Vk3Xuot3tY5A7nLi";}
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153144 +0800 - compute1 - 5376/0000 - alba/asd - 1 - info - Interpreted the config as: { Asd_config.Config.ips = ["192.168.6.120"]; port = (Some 8600);; transport = "tcp"; rora_ips = None; rora_port = None;; rora_transport = None; rora_num_cores = None; rora_queue_depth = None;; node_id = "Vk3Xuot3tY5A7nLi";; home = "/mnt/alba-asd/zrOyeYOcJsj4tHg5/yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie";; log_level = "info"; asd_id = (Some "yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie");; __sync_dont_use = true; limit = 99L; capacity = (Some 107318611968L);; buffer_size = None; multicast = None; tls = None;; tcp_keepalive =; { Tcp_keepalive.enable_tcp_keepalive = true; tcp_keepalive_time = 20;; tcp_keepalive_intvl = 20; tcp_keepalive_probes = 3 };; __warranty_void__write_blobs = true; use_fadvise = true;; use_fallocate = true; rocksdb_block_cache_size = (Some 26200833);; lwt_unix_pool_size = 100 }
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153181 +0800 - compute1 - 5376/0000 - alba/asd - 2 - info - Lwt_unix.pool_size : 100
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 153224 +0800 - compute1 - 5376/0000 - alba/asd - 3 - info - asd_server version:heads/master-0-ge43faca-dirty
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 200196 +0800 - compute1 - 5376/0000 - alba/asd - 4 - info - Closing listening socket on port 8600
Jul 24 16:37:19 compute1 alba[5376]: alba: internal error, uncaught exception:
Jul 24 16:37:19 compute1 alba[5376]: (Unix.Unix_error "Cannot assign requested address" bind "")
Jul 24 16:37:19 compute1 alba[5376]: 2019-07-24 16:37:19 200332 +0800 - compute1 - 5376/0000 - alba/asd - 5 - info - server for ADDR_INET(192.168.6.120,8600) going down: (Unix.Unix_error "Cannot assign requested address" bind "")
Jul 24 16:37:19 compute1 systemd[1]: alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service: main process exited, code=exited, status=1/FAILURE
Jul 24 16:37:19 compute1 systemd[1]: Unit alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service entered failed state.
Jul 24 16:37:19 compute1 systemd[1]: alba-asd-yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie.service failed.
Logs of alba-maintenance_storage
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004149 +0800 - compute1 - 1480/0000 - alba/maintenance - 9720 - info - Connecting to ADDR_INET(192.168.6.122,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004199 +0800 - compute1 - 1480/0000 - alba/maintenance - 9721 - info - Connecting to ADDR_INET(192.168.6.120,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004240 +0800 - compute1 - 1480/0000 - alba/maintenance - 9722 - info - Connecting to ADDR_INET(192.168.6.121,8600)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 004268 +0800 - compute1 - 1480/0000 - alba/maintenance - 9723 - info - ignoring: Alba_client_errors.Error.Exn(6)
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007679 +0800 - compute1 - 1480/0000 - alba/maintenance - 9724 - info - connect_with failed: 192.168.6.120 8600 None Net_fd.TCP (fd:27): Networking2.ConnectTimeout
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007741 +0800 - compute1 - 1480/0000 - alba/maintenance - 9725 - info - connect_with failed: 192.168.6.122 8600 None Net_fd.TCP (fd:31): Networking2.ConnectTimeout
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007816 +0800 - compute1 - 1480/0000 - alba/maintenance - 9726 - info - yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie "Networking2.ConnectTimeout" was unforeseen
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 007855 +0800 - compute1 - 1480/0000 - alba/maintenance - 9727 - info - yKVPpBz0g20BihAGdTJuhg84RfEnX2Ie "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 008152 +0800 - compute1 - 1480/0000 - alba/maintenance - 9728 - info - b2lt3fAUVxhuLZTKlWUCY4lCD0JXyQye "Networking2.ConnectTimeout" was unforeseen
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 008182 +0800 - compute1 - 1480/0000 - alba/maintenance - 9729 - info - b2lt3fAUVxhuLZTKlWUCY4lCD0JXyQye "Networking2.ConnectTimeout": should_invalidate:true should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012381 +0800 - compute1 - 1480/0000 - alba/maintenance - 9730 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:38): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012457 +0800 - compute1 - 1480/0000 - alba/maintenance - 9731 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:23): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012694 +0800 - compute1 - 1480/0000 - alba/maintenance - 9732 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012757 +0800 - compute1 - 1480/0000 - alba/maintenance - 9733 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012818 +0800 - compute1 - 1480/0000 - alba/maintenance - 9734 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:59): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012873 +0800 - compute1 - 1480/0000 - alba/maintenance - 9735 - info - connect_with failed: 192.168.6.121 8600 None Net_fd.TCP (fd:56): (Unix.Unix_error "Connection refused" connect "")
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 012937 +0800 - compute1 - 1480/0000 - alba/maintenance - 9736 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 013000 +0800 - compute1 - 1480/0000 - alba/maintenance - 9737 - info - GgFVjTT09hS7OxsDdCKt0zzNfg30VzSO "(Unix.Unix_error \"Connection refused\" connect \"\")": should_invalidate:false should_retry:false
Jul 24 16:36:49 compute1 alba[1480]: 2019-07-24 16:36:49 016237 +0800 - compute1 - 1480/0000 - alba/maintenance - 9738 - warning - could not receive enough fragments for namespace 52, object "tlog_5ed1e36f-a981-46a9-8a87-e63e9324b5bd" ("\154\188\024S\031 \191\181r\249\208\210\208\228\162.\1720V\183\023R\192[\242:\139Oyx\173\127") chunk 0; got 0 while 2 needed
The alba-asd service cannot activate up, but everything else looks working.
Best regards,
Yongsheng
JeffreyDevloo commented
Hi Yongsheng
Is any other process taking in the port 8600? You can check using 'netstat -tulpn | grep 8600'. Looks like the ALBA process can not bind the socket because of it.
Best regards
yongshengma commented
I'm sorry. I just find the reason. The network for ASD was down.
Thanks for reply!
Yongsheng