SMTP: timeout after rctp to:

tronical

Hi!

We noticed a situation where we're trying to send an email to somebody but mail delivery fails. I debugged this by looking into the mail logs and tried to reproduce the problem myself via telnet to port 25. I think I've succeeded diagnosing the issue that points towards something that I'm wondering if it is configurable in Cloudron:

The delivery starts by establishing a connection, delivering EHLO, MAIL TO and RCPT TO. After the RCPT to the server takes a long time until it responds with "250 OK" - a step that I think is deliberate.

With Cloudron's attempt at delivering the mail, it diagnoses socket timeout waiting on rcpt. Eventually after a few days of trying a bounce is generated.

If I try to establish the connection the same server from the same host "by hand" via telnet I can "reproduce" the situation and after rcpt to it does indeed sit a long time there, but eventually "250 OK" is generated.

So I'm wondering if there's a way to configure that rcpt timeout in Cloudron?

nebulon

This is more @girish's domain, however to try, you could SSH into your server, then run:

docker exec -ti mail /bin/bash

And you can temporarily change for example /run/haraka/config/smtp.ini according to https://haraka.github.io/core/CoreConfig/

I don't really know which values are affecting your situation, but maybe try to adjust inactivity_time. Haraka should reload the config file automatically.

tronical

@nebulon Thanks for the pointer! I dug a little and copied across outbound.ini and added connect_timeout=300 and pool_timeout=300. Haraka indeed appears to reload after editing, according to the logs.

Let's see if this helps.

Do you know by chance how to re-try delivery of queued messages when they're already in exponential backoff? I tried -c /run/haraka --qunstick but that didn't quite work (produce a socket error about connecting to port 2525 of ::0

Edit: I emptied the queue manually and tried sending an email new. Now I'm getting somewhere, a back off from the SMTP server (Upstream error: 451 Temporary local problem, please try again!), first after 64 seconds, then again after 128 seconds. The third time it worked and it was delivered, so standard "grey listing" after that.

It might be worth bumping the defaults in Cloudron for Haraka on this.

I guess the manual changes I've done to the config file are lost on the next update?

girish

@tronical yes correct, the changes will get lots when Cloudron updates (or indeed even if the mail container restarts).

From https://github.com/haraka/Haraka/blob/master/docs/Outbound.md, it seems connect_timeout is only for the initial connection which is probably not the case here (?) since we are well into the SMTP transaction. So, maybe I just need to bump pool_timeout which has a default of 50s per the docs. Atleast per my reading, that timeout is for unused connections as opposed to "non activity"/idle timeout. Let me check the haraka code.

girish

There is a inactivity_time in https://github.com/haraka/Haraka/blob/master/docs/CoreConfig.md but the default is 300 which is quite big.

But, I also found that idleTimeoutMillis is derived from pool_timeout which in turn is passed here - https://github.com/coopernurse/node-pool#creating-a-pool . In any case, I can bump pool_timeout to 300 for the next release.

tronical

I'm starting to learn to read the logs

When the delivery failed this was in the log when establishing the connection:

[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:50

and when it worked

[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:300

The 50 points indeed towards the pool_timeout.

Thanks a lot for your help and looking forward to the next release!

robi

@tronical said in SMTP: timeout after rctp to::

I'm starting to learn to read the logs

When the delivery failed this was in the log when establishing the connection:

[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:50

and when it worked

[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:300

The 50 points indeed towards the pool_timeout.

Thanks a lot for your help and looking forward to the next release!

That's a big jump.. can you keep testing to see how long it actually waits? 60s? 62s?

Also have you reached out to their support to see why it's taking so long for the mail server to respond?

tronical

@robi I measured again and it look 30-50 seconds. With the default pool_timeout of 50 that should suffice, yet when I tried multiple times with the defaults it never succeeded. It's possible that it was always close.

I have tried to reach out to their IT support, because at first I thought it's a bug on their end. Their response was that they don't see anything wrong on their end. It's also a prospective business customer/partner, so my interest in the ability to exchange emails with them outweighs theirs for sure at this stage. Consequently I'd rather apply generous timeouts than ask their IT to make changes

tronical

@girish Hi! Quick follow-up on this: I ran into this again (as I'm now using the unmodified pristine config) and it appears that I also do need connect_timeout=300. I still get exactly the same behavior: Backoff after 64, backoff after 128 and then third time it works.

girish

@tronical Is this a company's SMTP server or an ISP? Wondering if we can maybe add this config just for that domain.

I will investigate what is the impact of setting connect_timeout to 5 minutes, which seems so high!

tronical

@girish The MX record resolves to an ISPs server (Strato): smtpin.rzone.de But I don't know if this is particular timeout is domain specific or applies to all of strato. I'm worried it might be the former, at least protocol wise at the point of the delay the recipient domain is known and the server could apply domain specific settings for the backoff/delay.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

SMTP: timeout after rctp to: