SMTP: timeout after rctp to:
-
Hi!
We noticed a situation where we're trying to send an email to somebody but mail delivery fails. I debugged this by looking into the mail logs and tried to reproduce the problem myself via telnet to port 25. I think I've succeeded diagnosing the issue that points towards something that I'm wondering if it is configurable in Cloudron:
The delivery starts by establishing a connection, delivering EHLO, MAIL TO and RCPT TO. After the RCPT to the server takes a long time until it responds with "250 OK" - a step that I think is deliberate.
With Cloudron's attempt at delivering the mail, it diagnoses
socket timeout waiting on rcpt
. Eventually after a few days of trying a bounce is generated.If I try to establish the connection the same server from the same host "by hand" via telnet I can "reproduce" the situation and after rcpt to it does indeed sit a long time there, but eventually "250 OK" is generated.
So I'm wondering if there's a way to configure that rcpt timeout in Cloudron?
-
This is more @girish's domain, however to try, you could SSH into your server, then run:
docker exec -ti mail /bin/bash
And you can temporarily change for example
/run/haraka/config/smtp.ini
according to https://haraka.github.io/core/CoreConfig/I don't really know which values are affecting your situation, but maybe try to adjust
inactivity_time
. Haraka should reload the config file automatically. -
@nebulon Thanks for the pointer! I dug a little and copied across
outbound.ini
and addedconnect_timeout=300
andpool_timeout=300
. Haraka indeed appears to reload after editing, according to the logs.Let's see if this helps.
Do you know by chance how to re-try delivery of queued messages when they're already in exponential backoff? I tried
-c /run/haraka --qunstick
but that didn't quite work (produce a socket error about connecting to port 2525 of::0
Edit: I emptied the queue manually and tried sending an email new. Now I'm getting somewhere, a back off from the SMTP server (
Upstream error: 451 Temporary local problem, please try again!
), first after 64 seconds, then again after 128 seconds. The third time it worked and it was delivered, so standard "grey listing" after that.It might be worth bumping the defaults in Cloudron for Haraka on this.
I guess the manual changes I've done to the config file are lost on the next update?
-
@tronical yes correct, the changes will get lots when Cloudron updates (or indeed even if the mail container restarts).
From https://github.com/haraka/Haraka/blob/master/docs/Outbound.md, it seems connect_timeout is only for the initial connection which is probably not the case here (?) since we are well into the SMTP transaction. So, maybe I just need to bump
pool_timeout
which has a default of 50s per the docs. Atleast per my reading, that timeout is for unused connections as opposed to "non activity"/idle timeout. Let me check the haraka code. -
There is a
inactivity_time
in https://github.com/haraka/Haraka/blob/master/docs/CoreConfig.md but the default is 300 which is quite big.But, I also found that
idleTimeoutMillis
is derived frompool_timeout
which in turn is passed here - https://github.com/coopernurse/node-pool#creating-a-pool . In any case, I can bump pool_timeout to 300 for the next release. -
I'm starting to learn to read the logs
When the delivery failed this was in the log when establishing the connection:
[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:50
and when it worked
[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:300
The 50 points indeed towards the
pool_timeout
.Thanks a lot for your help and looking forward to the next release!
-
@tronical said in SMTP: timeout after rctp to::
I'm starting to learn to read the logs
When the delivery failed this was in the log when establishing the connection:
[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:50
and when it worked
[core] [outbound] acquired socket XXX for outbound::25:A.B.C.D:undefined:300
The 50 points indeed towards the
pool_timeout
.Thanks a lot for your help and looking forward to the next release!
That's a big jump.. can you keep testing to see how long it actually waits? 60s? 62s?
Also have you reached out to their support to see why it's taking so long for the mail server to respond?
-
@robi I measured again and it look 30-50 seconds. With the default
pool_timeout
of 50 that should suffice, yet when I tried multiple times with the defaults it never succeeded. It's possible that it was always close.I have tried to reach out to their IT support, because at first I thought it's a bug on their end. Their response was that they don't see anything wrong on their end. It's also a prospective business customer/partner, so my interest in the ability to exchange emails with them outweighs theirs for sure at this stage. Consequently I'd rather apply generous timeouts than ask their IT to make changes
-
@girish Hi! Quick follow-up on this: I ran into this again (as I'm now using the unmodified pristine config) and it appears that I also do need
connect_timeout=300
. I still get exactly the same behavior: Backoff after 64, backoff after 128 and then third time it works. -
@girish The MX record resolves to an ISPs server (Strato): smtpin.rzone.de But I don't know if this is particular timeout is domain specific or applies to all of strato. I'm worried it might be the former, at least protocol wise at the point of the delay the recipient domain is known and the server could apply domain specific settings for the backoff/delay.