HOWTO: Breaking and fixing DNS - Understanding modern DNS on Ubuntu.

One dark and stormy night I broke my DNS. I decided to move beyond /etc/resolv.conf and see what demons (daemons?) were lurking under the hood. “Its complicated.” This is the story of understanding, debugging and fixing it.

1 /etc/resolv.conf

If you look at /etc/resolv.conf on a Linux system today (Ubuntu 19.10) you will find something like:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.1
search lan

But the file seems to change. I’ve seen it without most of the verbiage above. I’ve seen the file contain both 127.0.0.1 and 127.0.0.53. Confusing. systemd?

2 You can edit /etc/resolv.conf

First let me say that despite the dire warnings below, you can edit /etc/resolv.conf, e.g. to make it look like

# Generated by NetworkManager
search lan
nameserver 9.9.9.9

And it will work until NetworkManager chooses to overwrite the file. Not sure if sudo chmod 444 /etc/resolv.conf be enough to keep NetworkManager from overwriting it.

3 You can make /etc/resolv.conf immutable

If you do edit /etc/resolv.conf you can make it immutable to prevent systemd from updating it:

$ sudo chattr +i /etc/resolv.conf
$ sudo rm /etc/resolv.conf
rm: cannot remove '/etc/resolv.conf': Operation not permitted

4 Debugging a broken DNS

I was living dangerously and simultaneously playing with https://pi-hole.net/ and letting Ubuntu try to upgrade my system. It went south. DNS stopped working. The following were some of the debugging steps I took to try to understand/fix the issue:

4.1 Testing resolution - is name resolution working?

In this phase of debugging, I try to do name resolution as configured:

dig - no namserver specified
I ran $ dig www.uu.net to see if everything was working as intended. Nope. No response.
dig - known-good nameserver
I ran $ dig www.uu.net @9.9.9.9 to see if I could resolve against a known-good nameserver. This worked. No issues with connectivity/routing.
dig - 127.0.0.53
I ran $ dig www.uu.net @127.0.0.53 to see if the local systemd-resolved nameserver specified in /etc/resolv.conf was working. Nope.
systemd-resolved - how is it configured?
I ran $ systemd-resolve --status to see how systemd thought dns was configured. The wireless interface I was using pointed to a nameserver (the proxy server on my wireless router) that should work:
$ systemd-resolve --status
...
Link 3 (wlp2s0)
      Current Scopes: DNS
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.86.1
          DNS Domain: ~.
                      lan
systemd-resolve - let systemd resolve a name
dig(1) and host(1) are not the only game in town for doing command line DNS look-ups. Systemd (of course) will do it for you:
                      $ systemd-resolve www.uu.net
                      www.uu.net: 152.195.32.39

In this case, it worked, which tells me that systemd-resolved is happy and working.

try dig again
Try another “normal” lookup:
                      $ dig www.uu.net

This failed. The conclusion seems to be that the whatever the resolver library is looking at (127.0.0.53) is not working.

edit /etc/resolv.conf
Pointing /etc/resolv.conf at working nameservers fixed the problem:
# Generated by NetworkManager
search lan
#nameserver 127.0.0.53  # BROKEN. systemd-resolved nameserver set by NetworkManager
#nameserver 9.9.9.9     # WORKS. quad9 nameserver
nameserver 192.168.86.1 # WORKS. wireless router nameserver

4.1.1 Conclusion - the systemd-resolved is not answering

4.2 What name resolution processes are running?

The next question is: what’s (not) running? What’s (not) listening?

To answer these questions, I poked at the network and the running processes:

nmap - look for listeners
nmap did not show a DNS listener at 127.0.0.53
gmj@ed home-computing [master] $ sudo nmap -v -sU -PS  127.0.0.53

Starting Nmap 7.60 ( https://nmap.org ) at 2020-05-10 07:51 EDT
Initiating Parallel DNS resolution of 1 host. at 07:51
Completed Parallel DNS resolution of 1 host. at 07:51, 0.02s elapsed
Initiating UDP Scan at 07:51
Scanning 127.0.0.53 [1000 ports]
Completed UDP Scan at 07:51, 2.80s elapsed (1000 total ports)
Nmap scan report for 127.0.0.53
Host is up (0.000049s latency).
Not shown: 997 closed ports
PORT     STATE         SERVICE
68/udp   open|filtered dhcpc
631/udp  open|filtered ipp
5353/udp open|filtered zeroconf

zeroconf :: Is zeroconf listening? What is 5353?

It looks like 5353 is multicast DNS.

$ egrep -i domain\|dns /etc/services
domain		53/tcp				# Domain Name Server
domain		53/udp
mdns		5353/tcp			# Multicast DNS
mdns		5353/udp
lsof -i
look at listening ports

Next, I used lsof(1) to look at listening and connected ports, successively grepping out the “known” and “uninteresting”:

gmj@ed home-computing [master] $  sudo lsof -i -n  | egrep -vi established\|dropbox\|ssh\|http\|smtp\|bootp\|ipp
COMMAND     PID            USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
avahi-dae  1064           avahi   12u  IPv4  25434      0t0  UDP *:mdns
avahi-dae  1064           avahi   13u  IPv6  25435      0t0  UDP *:mdns
avahi-dae  1064           avahi   14u  IPv4  25436      0t0  UDP *:42027
avahi-dae  1064           avahi   15u  IPv6  25437      0t0  UDP *:44240
dnsmasq    2538 libvirt-dnsmasq    5u  IPv4  37248      0t0  UDP 192.168.122.1:domain
dnsmasq    2538 libvirt-dnsmasq    6u  IPv4  37249      0t0  TCP 192.168.122.1:domain (LISTEN)
brave     28951             gmj   43u  IPv4 250584      0t0  UDP 224.0.0.251:mdns

Looks like avahe-dae[mon] is listening on multicast-dns (mdns) on 5353, and there are outbound connections to 192.168.122.1:53, which was a wired connection to the router, but nothing listening on port 53. This is a problem.

4.3 Why is systemd-resolved not answering - do I care?

Do I really want to debug systemd-resolved? No. I was half planing on upgrading to the latest Ubuntu release (20.04) anyhow. This seems like the time to do it, rather than debugging this problem further.

4.4 Lessons learned

run servers on dedicated systems
I had been messing with https://pi-hole.net/ on this system (a laptop that mostly does not move/go off the net). There was some confusion/doubt about whether this interacted badly with things/caused the problems. It may have. I un-installed it. But running a dedicated server would be better.
Failed Ubuntu “upgrade”
The actual trigger that made things not work was an attempt to let the Ubuntu installer upgrade the system. This failed in strange ways. After running, my system which was Ubuntu 19.10 reported (/etc/issue) to being 18.04 and the pi-hole logs reported that they could not find the wireless interface it had been configured to use (but the device was still there, same name, still working…)

5 Next Steps

TODO 5.1 Do a hard upgrade to Ubuntu 20.04

TODO 5.2 Set up a server to run pi-hole and other services

6 Things to learn more about

avhai
So what is avhai-dae[mon]? It looks like a zero-configuration (I wish !) networking services that uses multi-cast DNS on a local network. Do I need to be running this?
systemd-resolved
I may want to learn more about this, as it is part of the new regime in most Linux distros. But not now.