hashicorp/terraform-aws-consul

Consul client is not producing a valid lookup on vault.service.consul (following ubuntu 18 documentation)

Opened this issue · 6 comments

When testing on ubuntu 18 for a vault client, I can only use dig @localhost vault.service.consul and not dig vault.service.consul. This results in vault commands not succeeding.

I usually use the client on Amazon Linux 2 and centos which work for me. I installed systemd with defaults for ubuntu18.

sudo /tmp/terraform-aws-consul/modules/setup-systemd-resolved/setup-systemd-resolved

I can see consul members and can lookup vault with dig @localhost vault.service.consul

ubuntu@ip-10-4-101-58:~$ dig @localhost vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> @localhost vault.service.consul
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62100
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; ANSWER SECTION:
vault.service.consul.   0       IN      A       10.4.1.247
vault.service.consul.   0       IN      A       10.4.2.183
vault.service.consul.   0       IN      A       10.4.2.46

;; Query time: 1 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sun Dec 20 00:25:12 UTC 2020
;; MSG SIZE  rcvd: 97

...But cannot with dig vault.service.consul


ubuntu@ip-10-4-101-58:~$ dig vault.service.consul

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> vault.service.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 28434
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;vault.service.consul.          IN      A

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Sun Dec 20 00:25:22 UTC 2020
;; MSG SIZE  rcvd: 49

ubuntu@ip-10-4-101-58:~$ vault status
Error checking seal status: Get "https://vault.service.consul:8200/v1/sys/seal-status": dial tcp: lookup vault.service.consul on 127.0.0.53:53: no such host
ubuntu@ip-10-4-101-58:~$ dig vault.service.consul +trace

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> vault.service.consul +trace
;; global options: +cmd
;; Received 51 bytes from 127.0.0.53#53(127.0.0.53) in 0 ms

If I check the status of the service I can see:

ubuntu@ip-10-4-101-58:~$ sudo service status systemd-resolved
status: unrecognized service
ubuntu@ip-10-4-101-58:~$ sudo service systemd-resolved status
● systemd-resolved.service - Network Name Resolution
   Loaded: loaded (/lib/systemd/system/systemd-resolved.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-12-19 23:56:42 UTC; 39min ago
     Docs: man:systemd-resolved.service(8)
           https://www.freedesktop.org/wiki/Software/systemd/resolved
           https://www.freedesktop.org/wiki/Software/systemd/writing-network-configuration-managers
           https://www.freedesktop.org/wiki/Software/systemd/writing-resolver-clients
 Main PID: 689 (systemd-resolve)
   Status: "Processing requests..."
    Tasks: 1 (limit: 1140)
   CGroup: /system.slice/systemd-resolved.service
           └─689 /lib/systemd/systemd-resolved

Dec 20 00:22:40 ip-10-4-101-58 systemd-resolved[689]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction w
Dec 20 00:23:27 ip-10-4-101-58 systemd-resolved[689]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction w
Dec 20 00:23:27 ip-10-4-101-58 systemd-resolved[689]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction w

...skipping...

This seems to be the problem.

; <<>> DiG 9.11.3-1ubuntu1.13-Ubuntu <<>> vault.service.consul +trace
;; global options: +cmd
;; Received 51 bytes from 127.0.0.53#53(127.0.0.53) in 0 ms

Normally should this be 127.0.0.1#53 ?

I really don't know much about systemd resolv.conf and how consul is supposed to configure it, but this seems strange...

ubuntu@ip-10-4-101-242:~$ cat /run/systemd/resolve/resolv.conf
# This file is managed by man:systemd-resolved(8). Do not edit.
#
# This is a dynamic resolv.conf file for connecting local clients directly to
# all known uplink DNS servers. This file lists all configured search domains.
#
# Third party programs must not access this file directly, but only through the
# symlink at /etc/resolv.conf. To manage man:resolv.conf(5) in a different way,
# replace this symlink by a static file or a different symlink.
#
# See man:systemd-resolved.service(8) for details about the supported modes of
# operation for /etc/resolv.conf.

nameserver 127.0.0.1
nameserver 10.4.0.2
search service.consul
ubuntu@ip-10-4-101-242:~$ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53
search service.consul
options edns0

And so when I just use dig vault.service.consul, it is not actually using the systemd/resolv.conf
I'd love to know why that would be happening.

I wonder if this is the cause of hashicorp/terraform-aws-vault#223?

Ah, I see you mentioned Vault in your first sentence, so yea, looks like these are related.

See also #155.

It looks like the symlink is just not linked correctly...