Backup Improvements: Restic Backend
-
@robi yes definitely.
I do know that people want urgent fixes when backups and restores are not working, I just don't know how we can take up this responsibility. If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
@girish said in Backup Improvements: Restic Backend:
I do know that people want urgent fixes when backups and restores are not working,
Yes, however be aware of fear based thinking here. It's not the end of the world, previous backups exist and the situation is temporary.
I just don't know how we can take up this responsibility.
It isn't your responsibility, even though you empathize and do a lot for us to make it easier. External backups are an external responsibility and best effort support here, which you do well. So, no fear.
If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
It's like any other issue, you and we will find a way. This is the way.
Again, avoid decisions out of fear. And no, it will not cost you money (you're not hurting there), as it's not a Cloudron issue, but an external one.You can relax and trust your loyal community to show you the way forward, as @fbartels and @necrevistonnezr already have.
-
https://forum.restic.net/t/fatal-packs-from-index-missing-in-repo/4869 --> https://github.com/restic/restic/issues/828#issuecomment-706186047 --> https://restic.readthedocs.io/en/stable/077_troubleshooting.html
Judging from the background story in Very slow restic prune , my guess would be that the S3 bucket listing is incomplete from time to time. So, a quite likely explanation is that the list of pack files which prune gets in incomplete and hence the missing file errors. As far as I remember, minio has options to ensure a consistent file listing or maybe you’re encountering some timeout.
-
For what it's worth, I had zero problems in the last 7 years I've been using restic on my server with around 380 GB of data to be backed up to Onedrive (via rclone)...
I do regular tests with restic and spot checks via restic-browser. -
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
-
OK, after 4 hours...
checkPack: Load: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 507.606314ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 985.229971ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 803.546856ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 1.486109007s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 2.070709754s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 3.67875363s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 4.459624189s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 6.775444383s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 15.10932531s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 13.811796615s: The specified key does not exist. checkPack: Load: The specified key does not exist. [2:36:37] 100.00% 57800 / 57800 packs Fatal: repository contains errors
-
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
@malvim said in Backup Improvements: Restic Backend:
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
Yes, use dry-run - https://docs.cloudron.io/backups/#dry-run
For Cloudron 9, we are adding backup integrity checks feature . This will prevent against bitrot and also some sort of self validation that it is backing up all files.
-
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
@malvim said in Backup Improvements: Restic Backend:
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
https://restic.readthedocs.io/en/latest/045_working_with_repos.html#checking-integrity-and-consistency
and, as I said, spot checks for files with restic-browser.