skynetservices/skydns

Incorrect response size calculation

kdima opened this issue · 6 comments

kdima commented

We are seeing issues with dns responses from skydns server when the response size is around 550 bytes.

Here is a repro case. I am running skydns locally and I have my resolv.conf modified to point at localhost.

dig some-domain.example.com SRV

; <<>> DiG 9.10.3-P4-Ubuntu <<>> some-domain.example.com SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62612
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;some-domain.example.com. IN SRV

;; ANSWER SECTION:
some-domain.example.com. 219 IN	SRV 10 33 5601 a.some-domain.example.com.
some-domain.example.com. 219 IN	SRV 10 33 5601 b.some-domain.example.com.
some-domain.example.com. 219 IN	SRV 10 33 5601 c.some-domain.example.com.

;; ADDITIONAL SECTION:
a.some-domain.example.com. 219 IN A 1.2.3.4
b.some-domain.example.com. 273 IN A 1.2.3.5
c.some-domain.example.com. 240 IN A 1.2.3.6

;; Query time: 38 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Feb 13 10:50:54 GMT 2017
;; MSG SIZE  rcvd: 568

Dig succeeds and reports message size 568.

If I now use go dns to do the same resolution using this code

package main

import (
	"net"
	"fmt"
)

func main() {
	_, a, err := net.LookupSRV("", "", "some-domain.example.com")
	if err != nil {
		fmt.Printf("err is %s\n", err.Error())
	}
	fmt.Printf("res is %+v\n", a)
}

Output is

err is lookup some-domain.example.com on 127.0.0.1:53: read udp 127.0.0.1:44438->127.0.0.1:53: i/o timeout
res is []

On the other hand if I try to lookup something with a larger response i.e. around 800 both dig and go dns work.

I have added some debugging into skydns while investigating a broken lookup and printed out what Msg.Len() returns when called inside Fit. It is returning message size of 457 even though when receiving the reply dig reports message size of 568.
I have tried 2 fixes that both seem to work:

  • I have disabled compression inside the Msg.Len function
  • I have added extra 100 bytes to Msg.Len reply.

So it looks to me like compression is being incorrectly handled for some reason. Also it looks like these issues only started happening after upstream miekg/dns got updated and disabled the compression. But this has not been verified.

miekg commented
kdima commented

Here is the tcpdump

12:11:13.284016 IP (tos 0x0, ttl 64, id 38646, offset 0, flags [DF], proto UDP (17), length 101)
    ip6-localhost.57863 > ip6-localhost.domain: [bad udp cksum 0xfe64 -> 0x9cb4!] 12312+ SRV? something.example.com. (73)
12:11:13.322520 IP (tos 0x0, ttl 64, id 38651, offset 0, flags [DF], proto UDP (17), length 596)
    ip6-localhost.domain > ip6-localhost.57863: [bad udp cksum 0x0054 -> 0xb89e!] 12312* q: SRV? something.example.com. 3/0/3 something.example.com. SRV a-07ecf750d42cbe603.something.example.com.:5601 10 33, something.example.com. SRV a-03bddc860f9a014cc.something.example.com.:5601 10 33, something.example.com. SRV a-0901ac67c31f85903.something.example.com.:5601 10 33 ar: a-07ecf750d42cbe603.something.example.com. A 1.2.49.214, a-03bddc860f9a014cc.something.example.com. A 1.2.54.86, a-0901ac67c31f85903.something.example.com. A 1.2.38.64 (568)
12:11:18.284740 IP (tos 0x0, ttl 64, id 39762, offset 0, flags [DF], proto UDP (17), length 101)
    ip6-localhost.48112 > ip6-localhost.domain: [bad udp cksum 0xfe64 -> 0x8721!] 27586+ SRV? something.example.com. (73)
12:11:18.323786 IP (tos 0x0, ttl 64, id 39767, offset 0, flags [DF], proto UDP (17), length 596)
    ip6-localhost.domain > ip6-localhost.48112: [bad udp cksum 0x0054 -> 0x2e9e!] 27586* q: SRV? something.example.com. 3/0/3 something.example.com. SRV a-03bddc860f9a014cc.something.example.com.:5601 10 33, something.example.com. SRV a-0901ac67c31f85903.something.example.com.:5601 10 33, something.example.com. SRV a-07ecf750d42cbe603.something.example.com.:5601 10 33 ar: a-03bddc860f9a014cc.something.example.com. A 1.2.54.86, a-0901ac67c31f85903.something.example.com. A 1.2.38.64, a-07ecf750d42cbe603.something.example.com. A 1.2.49.214 (568)

The request was done using go dns.
Here is the same request result using dig

dig some-domain.example.com SRV

; <<>> DiG 9.10.3-P4-Ubuntu <<>> something.example.com SRV
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62612
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;something.example.com. IN SRV

;; ANSWER SECTION:
something.example.com. 219 IN	SRV 10 33 5601 a.something.example.com.
something.example.com. 219 IN	SRV 10 33 5601 b.something.example.com.
something.example.com. 219 IN	SRV 10 33 5601 c.something.example.com.

;; ADDITIONAL SECTION:
a.something.example.com. 219 IN A 1.2.3.4
b.something.example.com. 273 IN A 1.2.3.5
c.something.example.com. 240 IN A 1.2.3.6

;; Query time: 38 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Mon Feb 13 10:50:54 GMT 2017
;; MSG SIZE  rcvd: 568

Ignore the different ips this is me sanitizing the results.

kdima commented

As far as I understand go dns does not accept replies larger than 512

miekg commented
miekg commented
kdima commented

Just to clarify this is not a stubzone lookup.