bug: 2.13.1 /v1/healthycheck 获取不到upstream 健康状态
heming79 opened this issue · 11 comments
Current Behavior
curl 127.0.0.1:9090/v1/healthcheck
{}
Expected Behavior
https://github.com/apache/apisix/blob/master/docs/en/latest/control-api.md#get-v1healthcheck
[
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/upstreams/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.2",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "upstreams"
},
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/routes/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.1",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "routes"
}
]
Error Logs
No response
Steps to Reproduce
1、start apisix
2、add upstream node
Environment
- APISIX version (run
apisix version
):2.13.1 - Operating system (run
uname -a
):centos7.9 - OpenResty / Nginx version (run
openresty -V
ornginx -V
): openresty/1.19.9.1 - etcd version, if relevant (run
curl http://127.0.0.1:9090/v1/server_info
):3.4.0 - APISIX Dashboard version, if relevant:2.8.0
- Plugin runner version, for issues related to plugin runners:
- LuaRocks version, for installation issues (run
luarocks --version
):3.8.0
Is this a steadily recurring problem, or an occasional one?
Is it because there is no request sent to the upstream?
As https://github.com/apache/apisix/blob/master/docs/en/latest/health-check.md shows,
We only start the health check when the upstream is hit by a request. There won't be any health check if an upstream is configured but isn't in used.
有时候能看到有值
[
{
"nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.93",
"port":20452
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"name":"upstream#/apisix/upstreams/409106632274886134",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"src_id":"409106632274886134"
},
{
"nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
},
{
"priority":0,
"weight":1,
"host":"192.168.6.54",
"port":20730
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20730
}
],
"name":"upstream#/apisix/upstreams/409108789875195382",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
}
],
"src_id":"409108789875195382"
}
]
有时候 是空的
/usr/local/apisix/logs # more 2022-05-26_07-00-00__error.log
2022/05/26 06:01:40 [error] 289#289: 27754543 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:31:27 [error] 297#297: 28435830 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:37:37 [error] 294#294: 28576762 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 07:00:00 [warn] 310#310: *29091039 [lua] log-rotate.lua:266: send USR1 signal to master process [1] for reopening log file, context: ngx.timer
现在就是空的
/usr/local/apisix/logs # curl 127.0.0.1:9090/v1/healthcheck
{}
另外 能不能集中显示一下 unhealthy_nodes , 我更关注的时候 unhealthy_nodes 能及时报警出来 ,需要及时通知监控工程师去解决unhealthy_nodes 的问题 。 尽快把服务恢复起来 。
/usr/local/apisix/apisix/control/v1.lua # line 110 add
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))
2022/05/26 08:29:42 [error] 630#630: *30924328 [lua] v1.lua:110: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
local upstream_mod = require("apisix.upstream")
local get_upstreams = upstream_mod.upstreams
这个routes 就没问题 稳定的 upstreams 经常就是空的 。
local routes = get_routes()
core.log.error("routes: ", core.json.delay_encode(routes))
local upstreams = get_upstreams()
core.log.error("upstreams: ", core.json.delay_encode(upstreams))
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))
2022/05/26 08:49:29 [error] 803#803: 31319311 [lua] v1.lua:106: handler(): routes: [{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123071262202320","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386086,"create_time":1653386086,"id":"409123293879081424","uri":"/"},"createdIndex":26,"update_count":0,"has_domain":false,"orig_modifiedIndex":26,"key":"/apisix/routes/409123293879081424","modifiedIndex":26},{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123196135021008","host":"service-bmlt.wanzhuanmohe.cn","name":"service-bmlt.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386133,"create_time":1653386133,"id":"409123373017209296","uri":"/"},"createdIndex":28,"update_count":0,"has_domain":false,"orig_modifiedIndex":28,"key":"/apisix/routes/409123373017209296","modifiedIndex":28},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653524709,"host":"service-yyzyh.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"service-yyzyh.wanzhuanmohe.cn/app","methods":["GET","POST"],"update_time":1653525007,"uri":"/app/","id":"409355866207164880","desc":"游戏接口域名/app 转发到广告接口"},"createdIndex":2015,"update_count":0,"has_domain":false,"orig_modifiedIndex":2018,"key":"/apisix/routes/409355866207164880","modifiedIndex":2018},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653536211,"host":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653536211,"uri":"/","id":"409375163394562512","desc":"cbd 游戏广告接口"},"createdIndex":2068,"update_count":0,"has_domain":false,"orig_modifiedIndex":2068,"key":"/apisix/routes/409375163394562512","modifiedIndex":2068}], client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:112: handler(): upstreams: , client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:113: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
Only upstream with requests sent will show their status.
/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams
<title>**500 Internal Server Error**</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style>An error occurred.
You can report issue to APISIX
Faithfully yours, APISIX.
是参数问题吗?
2022/05/26 08:57:59 [error] 944#944: *31488271 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/control/v1.lua:226: bad argument #2 to 'error' (expected table to have __tostring metamethod)
stack traceback:
coroutine 0:
[C]: in function 'error'
/usr/local/apisix/apisix/control/v1.lua:226: in function 'handler'
/usr/local/apisix/apisix/control/router.lua:79: in function 'handler'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:722: in function 'fn'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:590: in function 'match_route_opts'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:612: in function '_match_from_routes'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:663: in function 'match_route'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:709: in function 'match'
/usr/local/apisix/apisix/init.lua:795: in function 'http_control'
content_by_lua(nginx.conf:158):2: in main chunk, client: 127.0.0.1, server: , request: "GET /v1/upstreams HTTP/1.1", host: "127.0.0.1:9090"
/usr/local/apisix # curl 127.0.0.1:9090/v1/routes
[{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123293879081424","modifiedIndex":26,"orig_modifiedIndex":26,"value":{"status":1,"update_time":1653386086,"methods":["GET","POST"],"priority":0,"upstream_id":"409123071262202320","id":"409123293879081424","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","uri":"/*","create_time":1653386086},"clean_handlers":{},"createdIndex":26},{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123373017209296","modifiedIndex":28,"orig_modifiedIndex":28,"value":{"status":1,"update_time":1653386133,"methods":
另外一个集群:
/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams
[{"has_domain":false,"modifiedIndex":1655,"createdIndex":63,"clean_handlers":{},"key":"/apisix/upstreams/409106632274886134","value":{"type":"roundrobin","create_time":1653376154,"retries":3,"retry_timeout":2,"id":"409106632274886134","update_time":1653468030,"name":"service-yyzyh","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd养鱼专业户游戏接口","checks":{"passive":{"unhealthy":{"http_statuses":[429,500,503],"tcp_failures":0,"timeouts":0,"http_failures":0},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":0},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"timeout":1,"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"port":80}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.93","weight":1,"port":20452},{"host":"192.168.6.91","weight":1,"port":20450}],"pass_host":"pass","hash_on":"vars"}},{"has_domain":false,"modifiedIndex":1722,"createdIndex":88,"clean_handlers":{},"key":"/apisix/upstreams/409108789875195382","value":{"type":"roundrobin","create_time":1653377440,"retries":4,"retry_timeout":2,"id":"409108789875195382","update_time":1653556657,"name":"service-bmlt","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd 百亩良田游戏api接口","checks":{"passive":{"unhealthy":{"http_failures":2,"tcp_failures":2,"timeouts":6,"http_statuses":[429,500,503,502,504,404]},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":5},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"timeout":1}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.54","weight":1,"port":20730},{"host":"192.168.6.91","weight":1,"port":20730},{"host":"192.168.6.93","weight":2,"port":20731},{"host":"192.168.6.51","weight":1,"port":20731}],"pass_host":"pass","hash_on":"vars"}}]
/usr/local/apisix # curl 127.0.0.1:9090/v1/healthcheck
{}
In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained
In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained
That's the truth.
But I don't think it's a problem. In actual use, it only happens for a very short period of time when APISIX starts up. It's just that this phenomenon is amplified in testing.
As the requests increase, each worker processes the request and returns healthcheck data.