DNS lookup failure MX for yandex.com
-
Where is this coming from? The mail server logs? Can you check if unbound is running with
systemctl status unbound
. You can also restart unbound from the Services view. -
-
@robi I think the delay is the source of the problem. Do you know why it takes so long?
IIUC, nodejs uses c-ares library underneath for DNS queries. And ARES_OPT_TIMEOUTMS has a default of 5 seconds. If it doesn't get a response, it will think it's a failures (since dns resolution is UDP based, it has to rely on timeouts). It doesn't seem like Haraka can configure these timeouts from a quick read of https://github.com/haraka/Haraka/blob/master/outbound/mx_lookup.js
-
@girish said in DNS lookup failure MX for yandex.com:
host -t MX yandex.com 127.0.0.1
I'm having the same issue here but for a different domain... gov.bc.ca.
host -t MX gov.bc.ca 127.0.0.1
gives me this output:;; connection timed out; no servers could be reached
The DNS servers set in netplan are the following (so shouldn't be a problem):
nameservers: addresses: [1.1.1.1, 1.0.0.1, 8.8.8.8, 213.186.33.99] search: []
Are there any concerns on this above? Is there a different place the DNS addresses should be set? The only email failing to send is to the gov.bc.ca domain, everything else works fine. And external checks shows the MX record no problem for gov.bc.ca so it's not a DNS issue on their end at all.
The only thing in the unbound service logs is this repeatedly:
Nov 22 16:16:10 vps-<name> unbound[2484160]: [2484160:0] info: generate keytag query _ta-4f66. NULL IN
From the mail logs, I only see these two entries for the gov.bc.ca attempts:
2021-11-22T08:58:37.000Z [NOTICE] [3306618A-B219-42B7-A2F5-6DBF86CE783E.1.1] [outbound] MX Lookup for gov.bc.ca failed: Error: queryMx ETIMEOUT gov.bc.ca 2021-11-22T08:58:37.000Z [INFO] [3306618A-B219-42B7-A2F5-6DBF86CE783E.1.1] [outbound] bouncing mail: Too many failures (DNS lookup failure: Error: queryMx ETIMEOUT gov.bc.ca)
-
-
@girish Unfortunately it doesn't.
If I take out the 127.0.0.1 it works fine though (which I think is only the case because I added
1.1.1.1
to the /etc/resolv.conf file temporarily to see if that'd help at all):host -t NS gov.bc.ca gov.bc.ca name server pubdns-k.spanbc.ca. gov.bc.ca name server pubdns-c.spanbc.ca.
I tried restarting unbound but same issue too when running the host command with 127.0.0.01, FYI.
host -t NS gov.bc.ca 127.0.0.1 ;; connection timed out; no servers could be reached
Any suggestions then? I'm always a bit confused when it comes to DNS in Cloudron servers... does Cloudron force it's own DNS lookup server on a VPS with Cloudron, and thus any local config isn't really applicable to a server as it would be without Cloudron? Is that we try the
host
command with the 127.0.0.1 because it sends it through Cloudron's local DNS server (unbound)? -
@d19dotca yes, all DNS requests go via unbound (which is running on 127.0.0.1). Docker is configured to make DNS requests via unbound (so all apps will also indirectly use unbound).
I am not sure why unbound is unable to get the nameservers of that specific domain. If you edit
/etc/unbound/unbound.conf.d/cloudron-network.conf
, enable debugging there.verbosity: 5 log-queries: yes
Then restart unbound and check if we get additional hints.
-
@girish I made the change (and quickly put it back after seeing it grow so quickly), but it was on for a few minutes, I ran the test and here's the file link for download (it's 3 MB): https://filesharing.d19.ca/f.php?h=32DPrTGN&d=1
There's hundreds of lines in there for it, it seems. But here's some quick snippets in my very brief review right away:
It seems the initial NS are found:
2021-11-23T21:24:40+0000 vps-8b86529d unbound[1459245]: [1459245:0] info: incoming scrubbed packet: ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0 ;; flags: qr ; QUERY: 1, ANSWER: 0, AUTHORITY: 6, ADDITIONAL: 2 ;; QUESTION SECTION: gov.bc.ca. IN A ;; ANSWER SECTION: ;; AUTHORITY SECTION: gov.bc.ca. 300 IN NS pubdns-c.spanbc.ca. gov.bc.ca. 300 IN NS pubdns-k.spanbc.ca. hcgv77huiaek95dvf2mlh6mgc3747u7d.ca. 300 IN NSEC3 1 1 5 - hcif66mnpd6ucerv6dkg8nodve36k0ma TXT RRSIG ;{flags: optout} hcgv77huiaek95dvf2mlh6mgc3747u7d.ca. 300 IN RRSIG NSEC3 8 2 3600 20211127185149 20211120210941 6810 ca. CaN+r3F3jFEa+PKhUj1YVtegRPO83dQ9Ak9eFGgi4QCmIsOfTye0EgHad7+a1TtqOkLW6VwVghc6Gh83kecuulKRmM6IFwCMQI/TT/6jN53Mabhm+Zy3PZdqCMeaP2Fjs6PPsXbQVUbw0H/dSBP1l0mdKX72feKSPzQXd92++mA= ;{id = 6810} j7ndutk162v2aatm9t1tqeeftjri3jcv.ca. 300 IN NSEC3 1 1 5 - j7oh4h2jucnrgkn54kf5t3gj4v55cuel NS DS RRSIG ;{flags: optout} j7ndutk162v2aatm9t1tqeeftjri3jcv.ca. 300 IN RRSIG NSEC3 8 2 3600 20211129061916 20211122023917 6810 ca. opOLaNq6jn5w8EarGGa5tElQPbywUYC3OW1IJCQjnIwJS8fbO0RDKpE0p+Nv0gndmF8ELCqUJmSuCmRti7FeZDLMvkKzSfmwrx2BILlpiMNBArSswNhI9HbpoW+Dt8Gl+u2/jX7qbOMXNBZEx8Nn/PBrAWWvnwIx3Ur0xgB89Us= ;{id = 6810} ;; ADDITIONAL SECTION: pubdns-c.spanbc.ca. 300 IN A 142.34.50.57 pubdns-k.spanbc.ca. 300 IN A 142.34.208.20 ;; MSG SIZE rcvd: 594
I do see a few of these timeouts though:
2021-11-23T21:24:42+0000 vps-8b86529d unbound[1459245]: [1459245:0] debug: timeout udp
I don't know what these mean exactly, but for reference...
2021-11-23T21:25:10+0000 vps-8b86529d unbound[1459245]: [1459245:0] info: 2vRDCD mod2 pubdns-k.spanbc.ca. A IN 2021-11-23T21:25:10+0000 vps-8b86529d unbound[1459245]: [1459245:0] info: 4RDd mod2 rep gov.bc.ca. NS IN 2021-11-23T21:25:10+0000 vps-8b86529d unbound[1459245]: [1459245:0] info: 5RDdc mod2 rep gov.bc.ca. NS IN
-
@d19dotca In the logs, I see
dnssec status: not expected
. Can you try disabling DNSSEC?https://www.nlnetlabs.nl/documentation/unbound/howto-turnoff-dnssec/ . Can just add
val-permissive-mode: yes
in the unbound config.https://dnssec-analyzer.verisignlabs.com/gov.bc.ca confirms the domain has some DNSSEC errors.
-
@girish I tried this, restarted the unbound server after adding that parameter to /etc/unbound/unbound.conf.d/cloudron-network.conf, but my
host
commands still fail with the exact same thing.Current config values:
server: port: 53 interface: 127.0.0.1 interface: 172.18.0.1 do-ip6: no access-control: 127.0.0.1 allow access-control: 172.18.0.1/16 allow cache-max-negative-ttl: 30 cache-max-ttl: 300 val-permissive-mode: yes
Ran the restart command, but still seems to fail.
-
-
@girish Sent the email from the server's support page and allowed remote access for you. Thank you so much in advance, Girish! Very odd issue, I'd love to know what's going on there.
For what it's worth, I tried changing verbosity to 2 and logging the queries, and it seems my
host
commands now come back withSERVFAIL
error, where-as before it came back with nothing outside of what's noted earlier. Not sure if that's progress or not, haha. I've gone ahead and set it back, so it's not verbose right now.Here's what I got recently though after making that change for the verbosity to 2:
host -t NS gov.bc.ca 127.0.0.1 Using domain server: Name: 127.0.0.1 Address: 127.0.0.1#53 Aliases: Host gov.bc.ca not found: 2(SERVFAIL)
-
Trying to debug this further now. I cannot make much sense of the unbound logs. So, I wrote a simple node script to do DNS queries:
#!/usr/bin/env node 'use strict'; const { Resolver } = require('dns').promises; const resolver = new Resolver(); (async function () { try { const nameservers = await resolver.resolveMx('the.domain'); console.log(nameservers); } catch (e) { console.log('Exception when looking up name server: ', e); } })();
I get:
Exception when looking up name server: Error: queryMx ESERVFAIL the.domain at QueryReqWrap.onresolve [as oncomplete] (internal/dns/promises.js:169:17) { errno: undefined, code: 'ESERVFAIL', syscall: 'queryMx', hostname: 'the.domain' }
So, it's not an unbound issue but a general network issue. Trying to see what else we can try here. Of course, replacing
the.domain
with something likecloudron.io
works. So, it's the network connectivity between the nameservers of this specific domain.