"Temporary translation failure" error in Mail logs

d19dotca

I saw a first-time error message in my Mail logs today, and was wondering what it means (a Google search didn't turn up anything): Temporary translation failure

2022-05-19T04:11:12.000Z [NOTICE] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [core] connect ip=208.117.51.33 port=56950 local_ip=172.18.0.18 local_port=2587
2022-05-19T04:11:25.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [dnsbl] pass:67a9424b76fd4e5d391a15c2d4409da7.combined.mail.abusix.zone
2022-05-19T04:11:27.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [helo.checks] helo_host: o1.email.teamsnap.com, pass:bare_ip, dynamic, valid_hostname, rdns_match, host_mismatch
2022-05-19T04:11:29.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [spf] identity=helo ip=208.117.51.33 domain="o1.email.teamsnap.com" mfrom=<postmaster@o1.email.teamsnap.com> result=None
2022-05-19T04:11:29.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [spf] scope: helo, result: None, domain: o1.email.teamsnap.com
2022-05-19T04:11:33.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [tls] secured: cipher=TLS_AES_256_GCM_SHA384 version=TLSv1.3 verified=false
2022-05-19T04:11:33.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [core]  hook=unrecognized_command plugin=tls function=upgrade_connection params=STARTTLS retval=OK msg=""
2022-05-19T04:11:33.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD] [helo.checks] helo_host: o1.email.teamsnap.com, multi: true, pass:bare_ip, dynamic, valid_hostname, rdns_match, host_mismatch, host_mismatch
2022-05-19T04:11:37.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [spf] identity=mfrom ip=208.117.51.33 domain="sg.email.teamsnap.com" mfrom=<bounces+72460-20b1-{clientUsername}={clientDomain}@sg.email.teamsnap.com> result=Pass
2022-05-19T04:11:37.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [spf] scope: mfrom, result: Pass, domain: sg.email.teamsnap.com
2022-05-19T04:11:37.000Z [NOTICE] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [core] sender <bounces+72460-20b1-{clientUsername}={clientDomain}@sg.email.teamsnap.com> code=CONT msg=""
2022-05-19T04:11:44.000Z [INFO] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [core]  hook=rcpt plugin=cloudron function=translate_rcpt_to params=<{clientUsername}@{clientDomain}> retval=DENYSOFT msg="Temporary translation failure"
2022-05-19T04:11:49.000Z [NOTICE] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [core] recipient <{clientUsername}@{clientDomain}> code=DENYSOFT msg="Temporary translation failure" sender="bounces+72460-20b1-{clientUsername}={clientDomain}@sg.email.teamsnap.com"
2022-05-19T04:11:49.000Z [NOTICE] [3AA4E2F0-82A0-44FA-934C-BB3137FE65AD.1] [core] disconnect ip=208.117.51.33 rdns=o1.email.teamsnap.com helo=o1.email.teamsnap.com relay=N early=N esmtp=Y tls=Y pipe=N errors=0 txns=1 rcpts=0/1/0 msgs=0/0/0 bytes=0 lr="450 Temporary translation failure" time=37.077

Any ideas what that error means? It was indeed temporary as the message was successfully received and delivered to the local mailbox about 5 minutes later, it was just an error message I had never seen before and Google wasn't helping so figured I'd throw this out into the universe for future discoverers. Would just be good to better understand the error message.

girish

This is the message returned when the internal LDAP server is not reachable. Did the box code happen to restart or have some issues in that time frame? You have to check in the box logs.

d19dotca

@girish Ah very interesting! So I ran through the box logs and found this:

2022-05-19T04:11:52.731Z box:apphealthmonitor app health: 39 alive / 0 dead.
2022-05-19T04:12:07.705Z box:apphealthmonitor setHealth: 492b4264-fea8-4d52-bbe6-1f15ce72c53a (<appHostname>) waiting for 1184.295 to update health
ServerError [ServiceUnavailableError]: Response timeout
    at IncomingMessage.<anonymous> (/home/yellowtent/box/node_modules/connect-timeout/index.js:84:8)
    at IncomingMessage.emit (node:events:526:28)
    at Timeout._onTimeout (/home/yellowtent/box/node_modules/connect-timeout/index.js:49:11)
    at listOnTimeout (node:internal/timers:559:17)
    at processTimers (node:internal/timers:502:7) {
  code: 'ETIMEDOUT',
  timeout: 20000
}
Box GET /api/v1/apps 500 Internal Server Error Response timeout 21417.204 ms - 72

And when I did the translation of UTC time to local time, 04 UTC is equal to 9 PM Pacific Time, which is one of the two times I have the backups running each day. I wonder if the backup job froze the system or something for a short bit? Seems a little strange though if true, as there was never any OOM messages present, and no other apps had issues (from what I can see so far anyways). Any suggestions on how I could try to avoid this in the future?

FWIW, the backup didn't complete until about 27 minutes past the start of the hour, and there certainly wasn't any freezing going on for that length of time (my monitors would have picked that up if any of the apps weren't responding for more than 2 minutes).

2022-05-19T04:27:28.863Z box:shell startTask (stdout): Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 27min 27.287s
2022-05-19T04:27:28.905Z box:shell startTask (stdout): Service box-task-15237 finished with exit code 0
2022-05-19T04:27:28.923Z box:tasks startTask: 15237 completed with code 0 and signal 0
2022-05-19T04:27:28.924Z box:locker Released : full_backup
2022-05-19T04:27:28.924Z box:tasks startTask: 15237 done. error: null

jdaviescoates

@d19dotca said in "Temporary translation failure" error in Mail logs:

my monitors would have picked that up if any of the apps weren't responding for more than 2 minutes

Just how of interest, how have you got your monitoring set-up? Are you using Uptime Kuma or some other Cloudron app for this? Or something else entirely? Thanks!

d19dotca

@jdaviescoates said in "Temporary translation failure" error in Mail logs:

Just how of interest, how have you got your monitoring set-up? Are you using Uptime Kuma or some other Cloudron app for this? Or something else entirely? Thanks!

I’m using Uptime Robot services for now with Instatus used for the status page, but I also have been testing Uptime Kuma locally. I don’t think I could switch over to Uptime Kuma for certain until it has a subscription service for my clients to receive auto-notifications on certain services they pay for.

girish

@d19dotca The timeout issue very much looks like the box code lost connection to the MySQL database and then it later recovered (can't be 100% sure). Currently, we only watch for OOM messages from the containers and also the box code itself. We don't track OOM messages from the database. For this, can you check dmesg output and see if there is anything there? Usually the kernel dumps an oom dump (unless you or your VPS provider has disabled this) and it stands out in dmesg output.

Apps won't go down when MySQL goes down. New users cannot login via LDAP. The dashboard might be out for a couple of seconds but all this will be very hard to notice since apps will be working with existing "sessions" and Cloudron's dashboard is a SPA.

d19dotca

@girish said in "Temporary translation failure" error in Mail logs:

can you check dmesg output and see if there is anything there? Usually the kernel dumps an oom dump (unless you or your VPS provider has disabled this) and it stands out in dmesg output.

I just checked the dmesg* and kern* and syslog* files under /var/log/ but don't see any OOM or "out of memory" errors nor any kind of kernel panics or anything that seems relatable (IMO). Nothing about MySQL either.

@girish said in "Temporary translation failure" error in Mail logs:

Currently, we only watch for OOM messages from the containers and also the box code itself. We don't track OOM messages from the database.

How would one monitor the database for connectivity issues? I think that's locked down to only be accessed locally from Docker, so I can't use an external monitoring solution in that case right? I looked at the /var/log/mysql/error.log file and it's empty, FWIW.

girish

@d19dotca said in "Temporary translation failure" error in Mail logs:

How would one monitor the database for connectivity issues? I think that's locked down to only be accessed locally from Docker, so I can't use an external monitoring solution in that case right? I looked at the /var/log/mysql/error.log file and it's empty, FWIW.

/var/log/mysql/error.log is the correct file. There are actually 2 MySQL instances - one that runs outside of docker (this is used by the box code) and one that runs inside docker (this is the 'addon' used by apps).

Usually, the MySQL logs do show something when it restarts. Maybe the connectivity issue was caused by something else in your instance. Is this error happening periodically or was this a one off?

d19dotca

@girish That issue appears to have happened again two days ago... So originally on May 19th, but most recently I see occurrences of Temporary translation failure on May 20, 2022 around 11:50 PM Pacific Time. So it's definitely an infrequent issue, but appears to be recurring periodically.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

"Temporary translation failure" error in Mail logs