De-duplicating e-mails
-
Unfortunately, I still have an issue left after restoring my server today.
When I restored my cloudron, e-mails were only partially restored. I then used imapsync to finish the job, which it did - however, I am relatively certain it didn't recognise the emails that were already in the account and copied them again. Now I have a mix of e-mails, they are apparently all there, but I have 20GB more of those that will in the future slow down any backup if I don't do anything about it.
What I have already tried:
- using delete2duplicates --useheader "Message-Id" in imapsync
- imapdedump (a python tool I found for that purpose)
They deleted a few duplicates, but nothing significant. I still wonder, one user account has 80% more data, I don't really understand what else could be the reason.
I would ideally want to avoid restoring from backup now, because over the course of the day, there were already e-mails coming in that would be lost then.
-
Is it possible that emails were added by mail clients as they started syncing again after the restore? And maybe these email clients didn’t recognize the duplicates?
-
Could be! But how should I resolve that? delete2duplicates has actually found a few thousand e-mails, and everything seems to be there, but I still have about 20GB more than before.
@ekevu123 said in De-duplicating e-mails:
still have about 20GB more than before.
based on what? Note that Cloudron graphs are not live.
-
-