Regular short getaddrinfo EAI_AGAIN outages

mazarian

That's a fantastic idea! I will add it. I ended up migrating it off Cloudron for the time being because I have come to depend on UK for work and all the notifications were bogging me down.

I will restart the old instance to test and will report back what I find out.

jrl-abstract27

I do have the same kind of shortages with uptime kuma. I did add Cloudron to see. Any idea of what is happening ?

I have a dedicated server with hetzner.

CleanShot 2024-04-29 at 19.55.51@2x.png

girish

@jrl-abstract27 this is to do with the local DNS server (unbound) not resolving . In Cloudron 8 (the next release), we are removing unbound altogether and it will use your network's resolver via systemd-resolved. Maybe this issues gets sorted out with that.

jrl-abstract27

thanks @girish

factord

@girish

@girish said in Regular short getaddrinfo EAI_AGAIN outages:

@jrl-abstract27 this is to do with the local DNS server (unbound) not resolving . In Cloudron 8 (the next release), we are removing unbound altogether and it will use your network's resolver via systemd-resolved. Maybe this issues gets sorted out with that.

Hi, i just upgraded to Cloudron 8.0.3, rebooted server and i see Unbound still appears in the services pages, is that normal? does that mean uptime kuma still uses it?

nebulon

unbound is still used in Cloudron, but its usage is drastically reduced now. It is used for directly querying the nameservers to check if DNS records are already in-sync to avoid hitting NXDOMAIN for newly installed apps as well as for email DNS record lookup.

The rest now uses whatever the default setup, of the environment the server is running in, is.

girish

To conclude what @nebulon said, for Uptime Kuma this means that it uses the system DNS (and not unbound).

factord

@girish ouch, i still have the EAI_AGAIN error, i suppose i have to check with uptime kuma then. Any suggestion? maybe our datacenter dns is overloaded and we should use google dns or something like that?

girish

@factord right, so this means this is either a local DNS issue or uptime kuma issue. Quick idea is to just set up the server's /etc/resolv.conf with Google DNS maybe and check if it mitigates the issue.

thoresson

I've used Uptime Kuna without any problem since April. But for the last 24+ hours I suddenly have started to receive tons of these:

CleanShot 2024-09-24 at 20.58.12@2x.png

The affected hosts are on .se, and one .social. Some of them are hosted at the same box I run Cloudron on, some on others.

I see that it's not the same error message as in the OP, but similar enough to be related?

According to my logs, my Cloudron was updated to 8.0.4 on August 28, and 8.0.6 yesterday morning. I don't know for sure, but from the graphs it looks like the problem started as soon as that update had been installed. This is what the week graph looks like for or the monitored services with problems:

CleanShot 2024-09-24 at 21.15.53@2x.png

joseph

@thoresson usually, ENOTFOUND is a problem with the DNS. Is the DNS working on the server? You can try host www.cloudron.io from the host (via ssh) and also viw the Web Terminal of the Uptime Kuma app. Do they work?

thoresson

@joseph Yes, this is what I get using the web terminal.

CleanShot 2024-09-25 at 20.16.05@2x.png

But could it be that the DNS is going down intermittently?

joseph

@thoresson which version of Cloudron are you on? Can you upgrade to Cloudron 8.0.6 ? Starting 8.0.6, all app containers also use the system DNS. You can find out your DNS servers using resolvectl on the server.

thoresson

I'm on 8.0.6 since Tuesday morning, which incidentally also is when those problem started.

Should I use resolvectl from Uptime Kuma's web terminal?

joseph

@thoresson you should use it via SSH

thoresson

Netcup's DNS servers are configured for my Cloudron instance.

adamsmith12

@jdaviescoates said in Regular short getaddrinfo EAI_AGAIN outages:

I recently enabled email notifications on my Uptime Kuma and I've noticed there are regular short outages. Something to do with getaddrinfo EAI_AGAIN

Any idea what's causing this and how to resolve it?

Apparently

EAI_AGAIN is a DNS lookup timed out error, means it is a network connectivity error or proxy related error.

According to https://stackoverflow.com/questions/40182121/whats-the-cause-of-the-error-getaddrinfo-eai-again

Oh, and here it says:

EAI_AGAIN means the DNS server replied that it cannot currently fulfill the request. (If you want the hairy details, the RCODE field in the response is set to 2, SERVFAIL.)

There is no single solution because it entirely depends on why the DNS server sends that back. Maybe it's overloaded, maybe the network is down, maybe it got the same reply from its upstream server. Learn more

In general, the best you can do is wait a while and try again. Hope that helps.

I wonder what the specific issue is in my case (perhaps just network issues with Netcup?)

Same thoughts. I t has been addressed in the beautiful manner.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Regular short getaddrinfo EAI_AGAIN outages