Kong/kong

dns resolve error!

gbhuoo opened this issue · 9 comments

Kong: 0.12.1
Consul v1.0.3

kong.conf

dns_resolver = 192.168.5.232:8600

but , kong Can not be resolve normally,error info below:

logs/error.log
2018/02/10 12:20:02 [error] 31047#0: *3997 [lua] responses.lua:121: after(): failed the initial dns/balancer resolve for 'auth_center.service.consul' with: 32844, client: 10.0.0.143, server: kong, request: "GET /auth_center/v3 HTTP/1.1", host: "192.168.5.232:8083"

but, i do consul query, consul is ok:
[root@docker4-test consul]# dig @192.168.5.232 -p8600 auth_center.service.consul

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> @192.168.5.232 -p8600 auth_center.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51595
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;auth_center.service.consul. IN A

;; ANSWER SECTION:
auth_center.service.consul. 0 IN A 192.168.5.232

;; Query time: 2 msec
;; SERVER: 192.168.5.232#8600(192.168.5.232)
;; WHEN: 六 2月 10 12:40:39 CST 2018
;; MSG SIZE rcvd: 71

please apply this patch: https://github.com/Kong/kong/pull/3177/files

and then try again. It won't fix it, but will provide better insight in the error.

thank you ,i did patch it, so i get error log :

2018/02/10 19:45:28 [error] 23237#0: *930 [lua] balancer.lua:703: execute(): [dns] 32776. Tried: (short)demo.service.consul:(na) - cache-miss
demo.service.consul:33 - cache-miss/querying/16:192.168.5.232.node.api_test.consul removed/1:192.168.5.232.node.api_test.consul removed/dereferencing SRV
(short)192.168.5.232.node.api_test.consul:(na) - cache-miss
192.168.5.232.node.api_test.consul:16 - cache-hit
, client: 192.168.5.232, server: kong, request: "GET /demo/v3 HTTP/1.1", host: "192.168.5.232:8083"
2018/02/10 19:45:28 [error] 23237#0: *930 [lua] responses.lua:121: after(): failed the initial dns/balancer resolve for 'demo.service.consul' with: 32776, client: 192.168.5.232, server: kong, request: "GET /demo/v3 HTTP/1.1", host: "192.168.5.232:8083"

I think Kong has been query and achieved the correct results: port 32776, service :demo.service.consul , but why do it report a mistake?

dig info:
dig @127.0.1 -p8600 demo.service.consul SRV

; <<>> DiG 9.9.4-RedHat-9.9.4-51.el7_4.2 <<>> @127.0.1 -p8600 demo.service.consul SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47525
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;demo.service.consul. IN SRV

;; ANSWER SECTION:
demo.service.consul. 0 IN SRV 1 1 32776 192.168.5.232.node.api_test.consul.

;; ADDITIONAL SECTION:
192.168.5.232.node.api_test.consul. 0 IN A 192.168.5.232
192.168.5.232.node.api_test.consul. 0 IN TXT "consul-network-segment="

;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: 六 2月 10 19:48:12 CST 2018
;; MSG SIZE rcvd: 154

I have to try and reproduce this.

What strikes me as odd is this line from the log:

192.168.5.232.node.api_test.consul:16 - cache-hit

Which means it considered the record type TXT (int 16) as a cache hit, though it is not configured to even look for that type.

Can you try to disabled the TXT entry returned by Consul?

Found the issue. The problem was indeed caused by the TXT entry, which was not properly handled by the dns client.

As a workaround you can try:

  • remove the TXT entry
  • set the dns_order property to SRV, A, CNAME (effectively removing the last entry)

Let's keep this open until the new dns-client has been released and the Kong dependency has been updated.

thank you very much. I adopt your suggestion that this problem has been solved

thank you very much. I adopt your suggestion that this problem has been solved

@gbhuoo which kong version did you use then to fix this issue? eagerly waiting for your reply

@mayank-allen this issue was fixed 7 years ago. So unless you have an extremely old version (in which case we'll ask you to upgrade first), this is most likely not the same cause as your issue (assuming you have one).

If you have an issue, then probably best to file a new issue with proper details of how to reproduce and what errors you get.

@Tieske this is the recent thread of this year- #12568
I didn't understand the last comment of the author in this since we are not using directly any lua setup, we are using kong image version 3.5.0
Please kindly check the above thread and let me know how I can fix my issue
Eagerly waiting for your response