apache/apisix

bug: 2.13.1 /v1/healthycheck 获取不到upstream 健康状态

heming79 opened this issue · 11 comments

Current Behavior

curl 127.0.0.1:9090/v1/healthcheck
{}

Expected Behavior

https://github.com/apache/apisix/blob/master/docs/en/latest/control-api.md#get-v1healthcheck
[
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/upstreams/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.2",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "upstreams"
},
{
"healthy_nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
}
],
"name": "upstream#/routes/1",
"nodes": [
{
"host": "127.0.0.1",
"port": 1980,
"priority": 0,
"weight": 1
},
{
"host": "127.0.0.1",
"port": 1988,
"priority": 0,
"weight": 1
}
],
"src_id": "1",
"src_type": "routes"
}
]

Error Logs

No response

Steps to Reproduce

1、start apisix
2、add upstream node

Environment

  • APISIX version (run apisix version):2.13.1
  • Operating system (run uname -a):centos7.9
  • OpenResty / Nginx version (run openresty -V or nginx -V): openresty/1.19.9.1
  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):3.4.0
  • APISIX Dashboard version, if relevant:2.8.0
  • Plugin runner version, for issues related to plugin runners:
  • LuaRocks version, for installation issues (run luarocks --version):3.8.0

Is this a steadily recurring problem, or an occasional one?

Is it because there is no request sent to the upstream?

As https://github.com/apache/apisix/blob/master/docs/en/latest/health-check.md shows,

We only start the health check when the upstream is hit by a request. There won't be any health check if an upstream is configured but isn't in used.

有时候能看到有值
[
{
"nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.93",
"port":20452
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"name":"upstream#/apisix/upstreams/409106632274886134",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20450
}
],
"src_id":"409106632274886134"
},
{
"nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
},
{
"priority":0,
"weight":1,
"host":"192.168.6.54",
"port":20730
},
{
"priority":0,
"weight":1,
"host":"192.168.6.91",
"port":20730
}
],
"name":"upstream#/apisix/upstreams/409108789875195382",
"src_type":"upstreams",
"healthy_nodes":[
{
"priority":0,
"weight":2,
"host":"192.168.6.93",
"port":20731
}
],
"src_id":"409108789875195382"
}
]

有时候 是空的

/usr/local/apisix/logs # more 2022-05-26_07-00-00__error.log
2022/05/26 06:01:40 [error] 289#289: 27754543 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:31:27 [error] 297#297: 28435830 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 06:37:37 [error] 294#294: 28576762 [lua] init.lua:157: http_ssl_phase(): failed to fetch ssl config: failed to find SNI: please check if the client requests via IP or uses an outdated protocol. If you need to report an issue, provide a packet capture file of the TLS handshake., context: ssl_certificate_by_lua, client: 193.106.191.48, server: 0.0.0.0:443
2022/05/26 07:00:00 [warn] 310#310: *29091039 [lua] log-rotate.lua:266: send USR1 signal to master process [1] for reopening log file, context: ngx.timer

现在就是空的
/usr/local/apisix/logs # curl 127.0.0.1:9090/v1/healthcheck
{}

另外 能不能集中显示一下 unhealthy_nodes , 我更关注的时候 unhealthy_nodes 能及时报警出来 ,需要及时通知监控工程师去解决unhealthy_nodes 的问题 。 尽快把服务恢复起来 。

/usr/local/apisix/apisix/control/v1.lua # line 110 add
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))

image
2022/05/26 08:29:42 [error] 630#630: *30924328 [lua] v1.lua:110: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"

local upstream_mod = require("apisix.upstream")
local get_upstreams = upstream_mod.upstreams

这个routes 就没问题 稳定的 upstreams 经常就是空的 。
local routes = get_routes()
core.log.error("routes: ", core.json.delay_encode(routes))
local upstreams = get_upstreams()
core.log.error("upstreams: ", core.json.delay_encode(upstreams))
core.log.error("upstream_nodes: ", core.json.delay_encode(upstreams.nodes))

2022/05/26 08:49:29 [error] 803#803: 31319311 [lua] v1.lua:106: handler(): routes: [{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123071262202320","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386086,"create_time":1653386086,"id":"409123293879081424","uri":"/"},"createdIndex":26,"update_count":0,"has_domain":false,"orig_modifiedIndex":26,"key":"/apisix/routes/409123293879081424","modifiedIndex":26},{"clean_handlers":{},"value":{"status":1,"priority":0,"upstream_id":"409123196135021008","host":"service-bmlt.wanzhuanmohe.cn","name":"service-bmlt.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653386133,"create_time":1653386133,"id":"409123373017209296","uri":"/"},"createdIndex":28,"update_count":0,"has_domain":false,"orig_modifiedIndex":28,"key":"/apisix/routes/409123373017209296","modifiedIndex":28},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653524709,"host":"service-yyzyh.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"service-yyzyh.wanzhuanmohe.cn/app","methods":["GET","POST"],"update_time":1653525007,"uri":"/app/","id":"409355866207164880","desc":"游戏接口域名/app 转发到广告接口"},"createdIndex":2015,"update_count":0,"has_domain":false,"orig_modifiedIndex":2018,"key":"/apisix/routes/409355866207164880","modifiedIndex":2018},{"clean_handlers":{},"value":{"status":1,"priority":0,"create_time":1653536211,"host":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","upstream_id":"409356277903268304","name":"serviceapi-cbdmcnssp.wanzhuanmohe.cn","methods":["GET","POST"],"update_time":1653536211,"uri":"/","id":"409375163394562512","desc":"cbd 游戏广告接口"},"createdIndex":2068,"update_count":0,"has_domain":false,"orig_modifiedIndex":2068,"key":"/apisix/routes/409375163394562512","modifiedIndex":2068}], client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:112: handler(): upstreams: , client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"
2022/05/26 08:49:29 [error] 803#803: *31319311 [lua] v1.lua:113: handler(): upstream_nodes: null, client: 127.0.0.1, server: , request: "GET /v1/healthcheck HTTP/1.1", host: "127.0.0.1:9090"

Only upstream with requests sent will show their status.

/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams

<title>**500 Internal Server Error**</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style>

An error occurred.

You can report issue to APISIX

Faithfully yours, APISIX.

是参数问题吗?
2022/05/26 08:57:59 [error] 944#944: *31488271 lua entry thread aborted: runtime error: /usr/local/apisix/apisix/control/v1.lua:226: bad argument #2 to 'error' (expected table to have __tostring metamethod)
stack traceback:
coroutine 0:
[C]: in function 'error'
/usr/local/apisix/apisix/control/v1.lua:226: in function 'handler'
/usr/local/apisix/apisix/control/router.lua:79: in function 'handler'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:722: in function 'fn'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:590: in function 'match_route_opts'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:612: in function '_match_from_routes'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:663: in function 'match_route'
/usr/local/apisix//deps/share/lua/5.1/resty/radixtree.lua:709: in function 'match'
/usr/local/apisix/apisix/init.lua:795: in function 'http_control'
content_by_lua(nginx.conf:158):2: in main chunk, client: 127.0.0.1, server: , request: "GET /v1/upstreams HTTP/1.1", host: "127.0.0.1:9090"

/usr/local/apisix # curl 127.0.0.1:9090/v1/routes
[{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123293879081424","modifiedIndex":26,"orig_modifiedIndex":26,"value":{"status":1,"update_time":1653386086,"methods":["GET","POST"],"priority":0,"upstream_id":"409123071262202320","id":"409123293879081424","host":"service-yyzyh.wanzhuanmohe.cn","name":"service-yyzyh.wanzhuanmohe.cn","uri":"/*","create_time":1653386086},"clean_handlers":{},"createdIndex":26},{"update_count":0,"has_domain":false,"key":"/apisix/routes/409123373017209296","modifiedIndex":28,"orig_modifiedIndex":28,"value":{"status":1,"update_time":1653386133,"methods":

upstreams 有时是 500 有时又能查询
image

另外一个集群:
/usr/local/apisix # curl 127.0.0.1:9090/v1/upstreams
[{"has_domain":false,"modifiedIndex":1655,"createdIndex":63,"clean_handlers":{},"key":"/apisix/upstreams/409106632274886134","value":{"type":"roundrobin","create_time":1653376154,"retries":3,"retry_timeout":2,"id":"409106632274886134","update_time":1653468030,"name":"service-yyzyh","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd养鱼专业户游戏接口","checks":{"passive":{"unhealthy":{"http_statuses":[429,500,503],"tcp_failures":0,"timeouts":0,"http_failures":0},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":0},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"timeout":1,"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"port":80}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.93","weight":1,"port":20452},{"host":"192.168.6.91","weight":1,"port":20450}],"pass_host":"pass","hash_on":"vars"}},{"has_domain":false,"modifiedIndex":1722,"createdIndex":88,"clean_handlers":{},"key":"/apisix/upstreams/409108789875195382","value":{"type":"roundrobin","create_time":1653377440,"retries":4,"retry_timeout":2,"id":"409108789875195382","update_time":1653556657,"name":"service-bmlt","scheme":"http","keepalive_pool":{"size":320,"idle_timeout":60,"requests":1000},"desc":"cbd 百亩良田游戏api接口","checks":{"passive":{"unhealthy":{"http_failures":2,"tcp_failures":2,"timeouts":6,"http_statuses":[429,500,503,502,504,404]},"healthy":{"http_statuses":[200,201,202,203,204,205,206,207,208,226,300,301,302,303,304,305,306,307,308],"successes":5},"type":"http"},"active":{"unhealthy":{"http_failures":5,"tcp_failures":2,"timeouts":3,"tcp_Failures":2,"interval":1,"http_statuses":[429,404,500,501,502,503,504,505]},"healthy":{"http_statuses":[200,302],"interval":1,"successes":2},"type":"http","concurrency":10,"http_path":"/","https_verify_certificate":true,"timeout":1}},"timeout":{"send":6,"read":6,"connect":6},"nodes":[{"host":"192.168.6.54","weight":1,"port":20730},{"host":"192.168.6.91","weight":1,"port":20730},{"host":"192.168.6.93","weight":2,"port":20731},{"host":"192.168.6.51","weight":1,"port":20731}],"pass_host":"pass","hash_on":"vars"}}]
/usr/local/apisix # curl 127.0.0.1:9090/v1/healthcheck
{}

In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained

In the case of multiple processes, if the process hit by the request and the process hit by the api are not the same, the health check information cannot be obtained

That's the truth.

But I don't think it's a problem. In actual use, it only happens for a very short period of time when APISIX starts up. It's just that this phenomenon is amplified in testing.

As the requests increase, each worker processes the request and returns healthcheck data.