skynetservices/skydns

Ping issue?

nrvnrvn opened this issue · 14 comments

I'm not sure it is a proper place for discussion though...

We are using SkyDNS for containers running it as a service on docker host and passing --dns ${PRIVATE_IPV4} --dns-search=skydns.local to the DOCKER_OPTS

People set CNAME's and use them and I was reported that ping failed randomly for them returning ping: unknown host.

Some quick and dirty example to reproduce the issue:

supervisord.conf:

[supervisord]
nodaemon=true

[program:etcd]
priority=1
command=/etcd-v3.0.9-linux-amd64/etcd

[program:etcdctl-lol]
command=/etcd-v3.0.9-linux-amd64/etcdctl set /skydns/local/skydns/lol '{"host":"google.com"}'

[program:skydns]
priority=2
command=/skydns -nameservers 8.8.8.8:53,8.8.4.4:53
FROM debian

RUN \
    apt-get update && \
    apt-get install -y --no-install-recommends dnsutils curl wget supervisor && \
    curl -kL https://github.com/coreos/etcd/releases/download/v3.0.9/etcd-v3.0.9-linux-amd64.tar.gz -o etcd-v3.0.9-linux-amd64.tar.gz && \
    tar xzvf etcd-v3.0.9-linux-amd64.tar.gz

COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
COPY skydns /skydns

CMD ["/usr/bin/supervisord"]

Skydns^^^ binary just precompiled and copied from the folder with Dockerfile.

$ docker run --dns=127.0.0.1 --dns-search=skydns.local  skydns-check
WARNING: Localhost DNS setting (--dns=127.0.0.1) may fail in containers.
/usr/lib/python2.7/dist-packages/supervisor/options.py:296: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
  'Supervisord is running as root and it is searching '
2016-09-22 14:29:13,450 CRIT Supervisor running as root (no user in config file)
2016-09-22 14:29:13,450 WARN Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2016-09-22 14:29:13,459 INFO RPC interface 'supervisor' initialized
2016-09-22 14:29:13,459 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-09-22 14:29:13,460 INFO supervisord started with pid 1
2016-09-22 14:29:14,463 INFO spawned: 'etcd' with pid 7
2016-09-22 14:29:14,464 INFO spawned: 'skydns' with pid 8
2016-09-22 14:29:14,469 INFO spawned: 'etcdctl-lol' with pid 9
2016-09-22 14:29:14,738 INFO exited: etcdctl-lol (exit status 0; not expected)
2016-09-22 14:29:15,740 INFO success: etcd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-09-22 14:29:15,740 INFO success: skydns entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-09-22 14:29:15,742 INFO spawned: 'etcdctl-lol' with pid 23
2016-09-22 14:29:15,760 INFO exited: etcdctl-lol (exit status 0; not expected)
2016-09-22 14:29:17,764 INFO spawned: 'etcdctl-lol' with pid 27
2016-09-22 14:29:17,782 INFO exited: etcdctl-lol (exit status 0; not expected)
2016-09-22 14:29:20,789 INFO spawned: 'etcdctl-lol' with pid 31
2016-09-22 14:29:20,808 INFO exited: etcdctl-lol (exit status 0; not expected)
2016-09-22 14:29:21,809 INFO gave up: etcdctl-lol entered FATAL state, too many start retries too quickly
...

$ docker ps
CONTAINER ID        IMAGE                                    COMMAND                  CREATED             STATUS              PORTS               NAMES
09cba0de4071        skydns-check                             "/usr/bin/supervisord"   12 minutes ago      Up 12 minutes                           fervent_banach

$ docker exec -ti 09cba0de4071 /etcd-v3.0.9-linux-amd64/etcdctl get /skydns/local/skydns/lol
{"host":"google.com"}

So, nslookup, host, dig(FQDN-only) always work. Name resolution works.
wget and curl and all internal mechanisms for communicating via http work as well.
I need to mention that the above example with google.com is not a good one because wget and curl seem to not perform the request successfully returning something like:

$ docker exec -ti 09cba0de4071 curl -Lv lol
* Rebuilt URL to: lol/
* Hostname was NOT found in DNS cache
*   Trying 87.245.198.20...
* Connected to lol (87.245.198.20) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.38.0
> Host: lol
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Content-Type: text/html; charset=UTF-8
< Content-Length: 1561
< Date: Thu, 22 Sep 2016 14:48:14 GMT
< 
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 404 (Not Found)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>404.</b> <ins>That’s an error.</ins>
  <p>The requested URL <code>/</code> was not found on this server.  <ins>That’s all we know.</ins>
* Connection #0 to host lol left intact

But that's ok because the host resolves successfully into the ip address.

But when we go to ping we get something like:

$ for i in {1..100}; do docker exec -ti 09cba0de4071 ping -c1 lol > /dev/null 2>&1 || echo notok; done | wc -l
      15

ping utility being shipped with at least debian and ubuntu works like this one above returning ping:unknown host randomly. busybox-based (alpine for instance) distros which rely on busybox-brewed toolset always resolve this lol CNAME.

So for now I can only blame ping for using obsoleted methods but if you could have a look on this from your perspective and perhaps shed some light I would highly appreciate!

miekg commented

This is prolly #217

Note that I'm working on a (better) replacement for SkyDNS: CoreDNS (https://coredns.io) and I would love to get some production feedback on that.

will do and get back with the results shortly, thanks!

what should my Corefile look like for the above setup?
I use

.:53 {
    etcd skydns.local {
        upstream 8.8.8.8:53 8.8.4.4:53
        debug
    }
    cache 160 skydns.local
    proxy . 8.8.8.8:53 8.8.4.4:53
}

but it does not help resolve lol

miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

what should my Corefile look like for the above setup?
I use

.:53 {
   etcd skydns.local {
       upstream 8.8.8.8:53 8.8.4.4:53
       debug
   }
   cache 160 skydns.local
   proxy . 8.8.8.8:53 8.8.4.4:53
}

but it does not help resolve lol

interesting... Corefile looks about right.

What does dig return in this case?

# dig o-o.debug.lol.skydns.local @localhost
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> o-o.debug.lol.skydns.local @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 3559
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;o-o.debug.lol.skydns.local.    IN  A

;; AUTHORITY SECTION:
skydns.local.       160 IN  SOA ns.dns.skydns.local. hostmaster.skydns.local. 1474876628 7200 1800 86400 60

;; ADDITIONAL SECTION:
.           160 CH  TXT "client: etcd cluster is unavailable or misconfigured\; error #0: unsupported protocol scheme \"\"\010\; error #1: unsupported protocol scheme \"\"\010"

;; Query time: 9 msec
;; SERVER: ::1#53(::1)
;; WHEN: Mon Sep 26 07:57:08 UTC 2016
;; MSG SIZE  rcvd: 259

hmm.... client: etcd cluster is unavailable or misconfigured\; error #0: unsupported protocol scheme \"\"\010\; error #1: unsupported protocol scheme \"\"\010

# curl localhost:2379/v2/keys/skydns/local/skydns/lol
{"action":"get","node":{"key":"/skydns/local/skydns/lol","value":"{\"host\":\"google.com\"}","modifiedIndex":7,"createdIndex":7}}```
miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

# dig o-o.debug.lol.skydns.local @localhost
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> o-o.debug.lol.skydns.local @localhost
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 3559
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;o-o.debug.lol.skydns.local.   IN  A

;; AUTHORITY SECTION:
skydns.local.      160 IN  SOA ns.dns.skydns.local. hostmaster.skydns.local. 1474876628 7200 1800 86400 60

;; ADDITIONAL SECTION:
.          160 CH  TXT "client: etcd cluster is unavailable or misconfigured\; error #0: unsupported protocol scheme \"\"\010\; error #1: unsupported protocol scheme \"\"\010"

;; Query time: 9 msec
;; SERVER: ::1#53(::1)
;; WHEN: Mon Sep 26 07:57:08 UTC 2016
;; MSG SIZE  rcvd: 259

hmm.... client: etcd cluster is unavailable or misconfigured\; error #0: unsupported protocol scheme \"\"\010\; error #1: unsupported protocol scheme \"\"\010

# curl localhost:2379/v2/keys/skydns/local/skydns/lol
{"action":"get","node":{"key":"/skydns/local/skydns/lol","value":"{\"host\":\"google.com\"}","modifiedIndex":7,"createdIndex":7}}

Other names work, it is just 'lol'? 'cause json looks OK (also I think you would
then get an actual json error).

Also debug queries FTW!

dig o-o.debug.google.com @localhost works for instance

I'm trying to figure out why I get this:

;; ADDITIONAL SECTION:
. 160 CH TXT "client: etcd cluster is unavailable or misconfigured; error #0: unsupported protocol scheme ""\010; error #1: unsupported protocol scheme ""\010"

miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

dig o-o.debug.google.com @localhost works for instance

That's prolly going out and not going to the etcd cluster.

Note your logs are full of:

2016-09-22 14:29:15,740 INFO success: skydns entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2016-09-22 14:29:15,742 INFO spawned: 'etcdctl-lol' with pid 23
2016-09-22 14:29:15,760 INFO exited: etcdctl-lol (exit status 0; not expected)
2016-09-22 14:29:17,764 INFO spawned: 'etcdctl-lol' with pid 27

I'm trying to figure out why I get this:

;; ADDITIONAL SECTION:
. 160 CH TXT "client: etcd cluster is unavailable or misconfigured; error #0: unsupported protocol scheme ""\010; error #1: unsupported protocol scheme ""\010"

You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#296 (comment)
/Miek

Miek Gieben

ok cool
Setting up the endpoint explicitly like

.:53 {
    etcd skydns.local {
        upstream 10.33.42.1:53
        endpoint http://localhost:2379
        debug
    }
    proxy . 10.33.42.1:53
}

did the trick!

root@0dc01a1d4d65:/# dig o-o.debug.lol.skydns.local  
;; Warning: Message parser reports malformed message packet.

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> o-o.debug.lol.skydns.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36591
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 2

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;o-o.debug.lol.skydns.local.    IN  A

;; ANSWER SECTION:
lol.skydns.local.   160 IN  CNAME   google.com.
google.com.     160 IN  A   87.245.198.23
google.com.     160 IN  A   87.245.198.24
google.com.     160 IN  A   87.245.198.27
google.com.     160 IN  A   87.245.198.22
google.com.     160 IN  A   87.245.198.20
google.com.     160 IN  A   87.245.198.26
google.com.     160 IN  A   87.245.198.25
google.com.     160 IN  A   87.245.198.21

;; ADDITIONAL SECTION:
lol.skydns.local.   160 CH  TXT "google.com:0(10,0,,false)[0,]"

;; Query time: 23 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Sep 26 09:36:54 UTC 2016
;; MSG SIZE  rcvd: 249
miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

ok cool
Setting up the endpoint explicitly like

cool :)

.:53 {
   etcd skydns.local {
       upstream 10.33.42.1:53
       endpoint http://localhost:2379
       debug
   }
   proxy . 10.33.42.1:53
}

did the trick!

also interesting. Let check/fix that.

root@0dc01a1d4d65:/# dig o-o.debug.lol.skydns.local

;; ANSWER SECTION:
lol.skydns.local.  160 IN  CNAME   google.com.
google.com.        160 IN  A   87.245.198.23
google.com.        160 IN  A   87.245.198.24
google.com.        160 IN  A   87.245.198.27
google.com.        160 IN  A   87.245.198.22
google.com.        160 IN  A   87.245.198.20
google.com.        160 IN  A   87.245.198.26
google.com.        160 IN  A   87.245.198.25
google.com.        160 IN  A   87.245.198.21

Looks very much OK to me. Is ping still unhappy? When happens when you point
to example.org as that has only 1 IP instead of the 8 that google.com uses.
And what happens when you use www.google.com instead of google.com?

/Miek

Miek Gieben

testing with our local setup shows that multi-dot hosts and 1-IP hosts work ok.
ping is happy and it is definitely because of #217

miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

testing with our local setup shows that multi-dot hosts and 1-IP hosts work ok.
ping is happy and it is definitely because of #217

Ack. Yes that shouldn't happen with CoreDNS, I haven't backported that fix to
skydns though

can CoreDNS be considered a production-ready drop-in replacement for SkyDNS?
I will test it by myself of course but the main question is what is the future of SkyDNS and will CoreDNS replace SkyDNS in Kubernetes?

miekg commented

[ Quoting notifications@github.com in "Re: [skynetservices/skydns] Ping is..." ]

can CoreDNS be considered a production-ready drop-in replacement for SkyDNS?

well.. yes, but's newer. Production ready is when people use it in production.
The etcd code is lifted from SkyDNS and cleaned up, but there could still be
bugs lurking.

I want people to use it in prod and report bugs, so they can be fixed. I'm using
CoreDNS in production myself, but only as an authoritative server without using
the etcd backend.

I will test it by myself of course but the main question is what is the future of SkyDNS and will CoreDNS replace SkyDNS in Kubernetes?

The hope is CoreDNS will replace SkyDNS in k8s, yes. See the 002 milestone bugs
in CoreDNS's issue tracker; when those are fixed, CoreDNS should more than capable
of replacing SkyDNS in k8s.

/Miek

Miek Gieben