Installation failed - DNS/resolvconf issues
-
I recently tried to install Cloudron on a fresh ISO install of both 20.04 and 18.04. On both distros the installation is failing when it tries to grab nginx:
=> installing nginx for xenial for TLSv3 support curl -sL http://nginx.org/packages/ubuntu/pool/nginx/n/nginx/nginx_1.18.0-2~${ubuntu_codename}_amd64.deb -o /tmp/nginx.deb
It gets to that point and stops. However the issue issue seems much deeper than that. Upon further inspection, DNS resolution breaks during the Cloudron installation. Uninstalling resolvconf fixes resolution, but naturally it's a part of the Cloudron pre-reqs.
Is this known, and can anyone else reproduce? I don't feel like I have a complex environment. Just Ubuntu running on KVM behind a firewall. DNS outbound is allowed, and I am able to do an nslookup successfully against multiple internal and external DNS servers. It looks like some part of resolvconf (or related) is broken.
Darn it, I was really looking forward to trying out Cloudron last night! Any help is much appreciated. If it ends up being an upstream issue, maybe there's a workaround that we can apply? I was considering just trying to remove resolvconf in the install script and gamble on how critical it is
-
Welcome here,
I think we have had something similar in the past, trying to remember what the workaround was. Once the installer has failed and DNS resolving breaks, can you take a the
resolvconf
andunbound
logs?The step before downloading nginx, would install those two packages amongst others, so maybe one of them simply fails to start and we have to check why.
-
Here are what the services look like. They look to be running, even though unbound looks unhappy. I even tried disabling apparmor for troubleshooting and no luck. I might get some sleep before long so it's entirely likely that I'll be slow at responding for a bit.
● resolvconf.service - Nameserver information manager Loaded: loaded (/lib/systemd/system/resolvconf.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2021-02-17 15:14:21 UTC; 5min ago Docs: man:resolvconf(8) Process: 419 ExecStart=/sbin/resolvconf --enable-updates (code=exited, status=0/SUCCESS) Main PID: 419 (code=exited, status=0/SUCCESS) Warning: journal has been rotated since unit was started, output may be incomplete.
● unbound.service - Unbound DNS server Loaded: loaded (/lib/systemd/system/unbound.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-02-17 15:14:26 UTC; 4min 48s ago Docs: man:unbound(8) Process: 715 ExecStartPre=/usr/lib/unbound/package-helper chroot_setup (code=exited, status=0/SUCCESS) Process: 757 ExecStartPre=/usr/lib/unbound/package-helper root_trust_anchor_update (code=exited, status=0/SUCCESS) Main PID: 797 (unbound) Tasks: 1 (limit: 38494) Memory: 15.2M CGroup: /system.slice/unbound.service └─797 /usr/sbin/unbound -d Feb 17 15:14:26 cloudron-temp systemd[1]: Starting Unbound DNS server... Feb 17 15:14:26 cloudron-temp package-helper[773]: /var/lib/unbound/root.key has content Feb 17 15:14:26 cloudron-temp package-helper[773]: fail: the anchor is NOT ok and could not be fixed Feb 17 15:14:26 cloudron-temp unbound[797]: [797:0] notice: init module 0: subnet Feb 17 15:14:26 cloudron-temp unbound[797]: [797:0] notice: init module 1: validator Feb 17 15:14:26 cloudron-temp unbound[797]: [797:0] notice: init module 2: iterator Feb 17 15:14:26 cloudron-temp unbound[797]: [797:0] info: start of service (unbound 1.9.4). Feb 17 15:14:26 cloudron-temp systemd[1]: Started Unbound DNS server.
-
Re: Installation failed - DNS/resolvconf issues
Gave it a shot, and it's still angry and not wanting to resolve things
It's super strange for sure. It's the same across 18.04, 20.04 iso installs and a 20.04 cloud image deployment. I did verify port 53 was open outbound at the firewall to be safe, even though it doesn't seem to be getting that far.
● unbound.service - Unbound DNS server Loaded: loaded (/lib/systemd/system/unbound.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2021-02-17 17:05:48 UTC; 23s ago Docs: man:unbound(8) Process: 1496 ExecStartPre=/usr/lib/unbound/package-helper chroot_setup (code=exited, status=0/SUCCESS) Process: 1501 ExecStartPre=/usr/lib/unbound/package-helper root_trust_anchor_update (code=exited, status=0/SUCCESS) Main PID: 1505 (unbound) Tasks: 1 (limit: 38494) Memory: 7.5M CGroup: /system.slice/unbound.service └─1505 /usr/sbin/unbound -d Feb 17 17:05:48 cloudron-temp systemd[1]: Starting Unbound DNS server... Feb 17 17:05:48 cloudron-temp package-helper[1504]: /var/lib/unbound/root.key has content Feb 17 17:05:48 cloudron-temp package-helper[1504]: fail: the anchor is NOT ok and could not be fixed Feb 17 17:05:48 cloudron-temp unbound[1505]: [1505:0] notice: init module 0: subnet Feb 17 17:05:48 cloudron-temp unbound[1505]: [1505:0] notice: init module 1: validator Feb 17 17:05:48 cloudron-temp unbound[1505]: [1505:0] notice: init module 2: iterator Feb 17 17:05:48 cloudron-temp unbound[1505]: [1505:0] info: start of service (unbound 1.9.4). Feb 17 17:05:48 cloudron-temp systemd[1]: Started Unbound DNS server.
Adding the below to my configuration fixes resolution (I noticed that it was trying to do IPv6 whereas the server only has IPv4 addressing). Also if i don't give it the forward zone, it fails and if I leave out the part for dnssec it fails too.
harden-dnssec-stripped: no server: do-ip4: yes do-ip6: no forward-zone: name: "." forward-addr: x.x.x.x
Post changes I tried the install again and it succeeded. I'm happy to keep troubleshooting unbound if you have any ideas aside from what I did. The changes got it going, but my tinfoil-hat-wearing alter ego would probably like to enable dnssec again.
############################################## Cloudron Setup (latest) ############################################## Follow setup logs in a second terminal with: $ tail -f /var/log/cloudron-setup.log Join us at https://forum.cloudron.io for any questions. => Updating apt and installing script dependencies => Checking version => Downloading version 6.1.2 ... => Installing base dependencies and downloading docker images (this takes some time) ... => Installing version 6.1.2 (this takes some time) ... => Waiting for cloudron to be ready (this takes some time) .... After reboot, visit https://x.x.x.x and accept the self-signed certificate to finish setup. The server has to be rebooted to apply all the settings. Reboot now ? [Y/n]
-
@theciscogeek Is there a way for us to test this setup? If it's some public cloud / VPS provider, we can sign up and test as well.
-
Interestingly if unbound-anchor tries to check the root.key using built-in resolution method (I'm guessing the system resolve.conf?), it fails. If I specify a resolve.conf pointing towards a dns server that isn't local unbound for it to use, it works.
root@cloudron-temp:~# unbound-anchor -v /var/lib/unbound/root.key has content fail: the anchor is NOT ok and could not be fixed
root@cloudron-temp:~# unbound-anchor -v -f resolv.conf /var/lib/unbound/root.key has content no last_success probe time in anchor file /etc/unbound/icannbundle.pem: No such file or directory using builtin certificate have 1 trusted certificates resolved server address 72.21.81.189 resolved server address 2606:2800:11f:bb5:f27:227f:1bbf:a0e connect to 72.21.81.189 fetched root-anchors/root-anchors.xml (690 bytes) connect to 72.21.81.189 fetched root-anchors/root-anchors.p7s (4182 bytes) signer 0: Subject: /O=ICANN/CN=dnssec@iana.org/emailAddress=dnssec@iana.org the PKCS7 signature verified XML was parsed successfully, 1 keys success: the anchor has been updated using the cert
-
@girish So far this has all been tested on a private cloud. I'll spin up an instance on a public cloud to verify and let you know if it occurs there so you can test too.
The version of unbound and unbound anchor that installed are 1.9.4-2ubuntu1.1
-
I finally tracked down the issue. And I have a red mark on my face now from repeated facepalming.
I checked firewall rules several times and saw that the traffic was allowed. Port 53 is allowed for both TCP and UDP outbound. Here's what I was seeing (simulated screenshots for documentation):
Port 53 is in both lists. What I was missing however was a destination NAT rule that forced all DNS traffic through local resolvers. It looks like unbound-anchor uses root servers unless you specify otherwise with a resolve.conf. It seems like unbound doesn't appreciate something masquerading as a different DNS server when it comes to DNSSEC.
Adding a rule to bypass this redirection for the Cloudron host resolved (pun intended) the issues. Alternatively adding the root servers to the allowedDnsServers list would've resolved the problem and will be a better long term solution.
Thanks for all of your help throughout this journey of cognitive enrichment. Hopefully if future people who have outbound communication secured run into this issue, they'll find this.