Unusual Mail Issues
-
@michaelpope What clients are they using?
-
@girish Thunderbird, Outlook 2007, Outlook 365. Only receiving the issues on Outlook 2007 and Thunderbird, and only on some clients, so it seems to be tied to certain accounts.
I'm trying a few things to see if there are corrupt mails in inboxes though... this might be tied to that instead. I'll let you know.
-
Update.
Had some more issues early today even after the export, deletion, re-creation, and import of the mailboxes.
However... due to some circumstances, I had to power off the VPS. When I rebooted it, I reduced the mail process down to 1GB of memory (raising it to 3GB had caused the crash)... and have not had any issues since then. Crossing fingers it stays that way.
top
is showing about 50% memory use, so only a little swap, which is fine. Hopefully I can get to no swap use in the future though.However, the real test will be seeing what happens on Monday and Tuesday. If it's good on those days, I will be very happy :).
-
Some updates:
The issues is still happening despite the server reboot.
I was looking into disk issues, but I'm not so sure of that anymore.
Some things to note:
Sogo Webmail is still working when the issue occurs. This leads me to conclude a few things:
- It means the issue is probably in 1 of 4 places:
- It could be an issue in how port 9393 is forwarded over to 993 (or at least I'm assuming there is port forwarding from 9393 to 993 - I can check if this is the source the next time the issue occurs using Sogo to connect to both ports)
- It could be an issue in how the mail is tied to the domain somehow. Sogo attempts to connect to the domain
mail
and it works fine. IMAP connects tomy.insertdomainnamehere.com
. (Once again, I can check this with Sogo the next time there is an issue). - There could be an issue in how dovecot is handling long lasting TCP connections (think the ones that go to IDLE). (don't know how I'd test this though tbh)
- I suppose there could still be an issue in the underlying storage (an RDMA) that is causing issues - though I don't see why it wouldn't list any errors in the logs at that point.
- It does mean that it's probably not an auth issue as Sogo is logging in fine
In the main time... probably gonna cronjob
supervisorctl restart dovecot
every few minutes. Won't be perfect but it restarts within a second, and is a lot quicker than restarting the entire mail process, so it probably won't be noticeable for end users.@girish I was wondering... perhaps the 'mail' docker image is having an issue (I doubt this, but it's I guess a possibility). Is there a way to re-download the docker mail container for Cloudron without causing any issues? (should I just docker image pull it, or would that break things?)
-
An update on those possible issues.
Port 993 over SSL worked in SOGO, even when a desktop client was having issues.
This means that it is likely an issue with (3). I'm going to see if I can change dovecot.conf so dovecot kills long lasting connections quicker and see if that makes the problem go away or exacerbate it next.
-
Another update. Updated the 2007 Outlook Client to Office 365. Haven't had any issues with them so far, but need to wait to be sure. There's a small chance that this whole thing was actually two separate email client issues, one with Thunderbird, and one with Outlook.
Certain versions of Outlook actually have an issue where they don't respond to Dovecot's request to see if they are 'still there' and if an IDLE IMAP connection is closed, they will go offline automatically (https://wiki.dovecot.org/Timeouts and also see point 2 and 3 of the last post in https://social.technet.microsoft.com/Forums/exchange/en-US/b418d42a-ce66-4cfa-adb9-e6e24ec769f4/outlook-2007-error-your-imap-server-closed-the-connection). I'm thinking this might have been the issue. Not seeing this issue on Office 365 clients other users are running.
As for the Thunderbird issue... found one account of someone running into something similar here: https://www.mail-archive.com/dovecot%40dovecot.org/msg82805.html . Still gotta work on that one, and do some measurements to see if we are hitting the process limit when we have the issue. This would be a weird case though as we are only seeing this issue with some Thunderbird clients, despite all running the same version and with similar software on the computers... which again points to a mailbox level issue. Gotta do some more sleuthing.
-
Just wanted to give everyone an update.
Last week was pretty crazy for me, so I wasn't able to do much testing.
I will probably resume doing some more tests this week. In any case, restarting dovecot every hour does seem to be a suitable mitigation for anybody who has this issue (at least pending until a solution is found).
-
@michaelpope Keep up the good work. Updates like what you are providing are invaluable for those who come later with the same, or similar, issues.
-
Another update, a bit further down the line.
Here's what I discovered so far:
-
Some older Outlook clients just don't disconnect. As of such, they can sometimes run into issues where the server times out. Best way to fix this is to occasionally
docker exec mail supervisorctl restart dovecot
. -
Modern Thunderbird clients might occasionally run into a situation where they can't connect to the server very consistently (except maybe on the Inbox). Best solution is to close Thunderbird and then run
docker exec mail supervisorctl restart dovecot
(as well). The client needs to be closed when you do this.
-
-
@michaelpope this seems like maybe dovecot is unable to accept new connections and restarting it fixes it.
The next time you hit this: can you check the contents of
/run/dovecot/dovecot.log
in the mail container ? That should have some error as to why dovecot is not accepting new connections. -
-