Backup Improvements: Restic Backend
-
@imc67 still working through the backups rewrite, will leave a note here once I have something.
-
@girish after 4 months I'm still curious for the results of the "backups rewrite" as our daily backup now takes >5 hours
-
Any update on the backup methodology? I read a lot of issues in the forum about crashing backups due to several reasons. Maybe this can help?
-
@imc67 we have not started working on a restic backend, if that's what you are asking.
I think most of the issues are just because of using a large variety of s3 providers. There's way too many of them. Made worse by people choosing the cheapest/low end providers who don't provide any support either. That and SSHFS/CIFS flakiness. I doubt moving to restic solves any of this because they are not related to backup code itself but infrastructure related. Just my thoughts though.
Now instead of dealing with our backup issues, we would be dealing with issue of another backend like say 1, 2 and 3 - these are just restic issues of last week. It would be worse because we have not much idea on how to deal with restic bugs.
-
@imc67 we have not started working on a restic backend, if that's what you are asking.
I think most of the issues are just because of using a large variety of s3 providers. There's way too many of them. Made worse by people choosing the cheapest/low end providers who don't provide any support either. That and SSHFS/CIFS flakiness. I doubt moving to restic solves any of this because they are not related to backup code itself but infrastructure related. Just my thoughts though.
Now instead of dealing with our backup issues, we would be dealing with issue of another backend like say 1, 2 and 3 - these are just restic issues of last week. It would be worse because we have not much idea on how to deal with restic bugs.
-
FWIW I have a RaspberryPi for 3 years with a Restic daily backup to a Cloudron Minio S3 and 2 weeks ago I needed a restore for the first time. It took me the most time to rediscover how I configured it 3 years ago and how to restore but it worked flawless! I know it’s just n=1 but it’s worth to have a look at it.
-
@girish It could also be the opposite experience, since restic would be handling all the issues, meaning less work for you with custom one-off fixes as they crop up.
@robi yes definitely.
I do know that people want urgent fixes when backups and restores are not working, I just don't know how we can take up this responsibility. If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
-
@robi yes definitely.
I do know that people want urgent fixes when backups and restores are not working, I just don't know how we can take up this responsibility. If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
@girish I think something that could be interesting, is the ability to use the cloudron filesystem dump and then have a "hook" that could post process the data. Possibly with capturing whatever that hook produces in a cloudron notification (to be natively informed if the post processing fails).
I have set up a few cloudrons in the last week with a combination of local disk rsync in cloudron (with local retention and hardlinks) and then using autorestic and Cron to push the cloudron backup via sftp to a hetzner storageboxes (optionally as a subuser). Have even performed a restore test, for which I first restored the previous state of
/var/backups
before running cloudrons restore.Autorestic is taking care of retention and streamlining the sftp backend use. I even use it to make regular snapshots of the appsdata and boxdata directories in the yellowtent home.
The benefit of the hook would be the extended piece of mind that as soon als cloudron has finished the backup, it is pushed to its remote location.
@girish said in Backup Improvements: Restic Backend:
This will cost us real money because people will ask for a refund
Yes, restore of backups is crucial. Luckily so far cloudron has not let me down and overall it has been a very stable experience.
-
@girish I think something that could be interesting, is the ability to use the cloudron filesystem dump and then have a "hook" that could post process the data. Possibly with capturing whatever that hook produces in a cloudron notification (to be natively informed if the post processing fails).
I have set up a few cloudrons in the last week with a combination of local disk rsync in cloudron (with local retention and hardlinks) and then using autorestic and Cron to push the cloudron backup via sftp to a hetzner storageboxes (optionally as a subuser). Have even performed a restore test, for which I first restored the previous state of
/var/backups
before running cloudrons restore.Autorestic is taking care of retention and streamlining the sftp backend use. I even use it to make regular snapshots of the appsdata and boxdata directories in the yellowtent home.
The benefit of the hook would be the extended piece of mind that as soon als cloudron has finished the backup, it is pushed to its remote location.
@girish said in Backup Improvements: Restic Backend:
This will cost us real money because people will ask for a refund
Yes, restore of backups is crucial. Luckily so far cloudron has not let me down and overall it has been a very stable experience.
@fbartels said in Backup Improvements: Restic Backend:
I have set up a few cloudrons in the last week with a combination of local disk rsync in cloudron (with local retention and hardlinks) and then using autorestic and Cron to push the cloudron backup via sftp to a hetzner storageboxes (optionally as a subuser). Have even performed a restore test, for which I first restored the previous state of
/var/backups
before running cloudrons restore.@fbartels Exactly my setup for a couple of years: https://forum.cloudron.io/topic/6928/tutorial-remote-backup-of-local-cloudron-backup-snapshots-with-restic-rclone/
What‘s nice is that you can e.g. check and restore files on a different machine running restic & rclone. -
@robi yes definitely.
I do know that people want urgent fixes when backups and restores are not working, I just don't know how we can take up this responsibility. If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
@girish said in Backup Improvements: Restic Backend:
I do know that people want urgent fixes when backups and restores are not working,
Yes, however be aware of fear based thinking here. It's not the end of the world, previous backups exist and the situation is temporary.
I just don't know how we can take up this responsibility.
It isn't your responsibility, even though you empathize and do a lot for us to make it easier. External backups are an external responsibility and best effort support here, which you do well. So, no fear.
If my production site is down and it doesn't restore, what now? This will cost us real money because people will ask for a refund.
It's like any other issue, you and we will find a way. This is the way.
Again, avoid decisions out of fear. And no, it will not cost you money (you're not hurting there), as it's not a Cloudron issue, but an external one.You can relax and trust your loyal community to show you the way forward, as @fbartels and @necrevistonnezr already have.
-
https://forum.restic.net/t/fatal-packs-from-index-missing-in-repo/4869 --> https://github.com/restic/restic/issues/828#issuecomment-706186047 --> https://restic.readthedocs.io/en/stable/077_troubleshooting.html
Judging from the background story in Very slow restic prune , my guess would be that the S3 bucket listing is incomplete from time to time. So, a quite likely explanation is that the list of pack files which prune gets in incomplete and hence the missing file errors. As far as I remember, minio has options to ensure a consistent file listing or maybe you’re encountering some timeout.
-
For what it's worth, I had zero problems in the last 7 years I've been using restic on my server with around 380 GB of data to be backed up to Onedrive (via rclone)...
I do regular tests with restic and spot checks via restic-browser. -
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
-
OK, after 4 hours...
checkPack: Load: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 507.606314ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 985.229971ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 803.546856ms: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 1.486109007s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 2.070709754s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 3.67875363s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 4.459624189s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 6.775444383s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 15.10932531s: The specified key does not exist. Load(<data/4a8f87b965>, 0, 0) returned error, retrying after 13.811796615s: The specified key does not exist. checkPack: Load: The specified key does not exist. [2:36:37] 100.00% 57800 / 57800 packs Fatal: repository contains errors
-
@necrevistonnezr how have you been testing backups? I have a raspberry pi at home, set up with restic/rclone backups as well. Once in a while I mount one of the latest snapshots and check a bit randomly if things "look ok". I'd like to have a better system to check if backups are okay, so any pointers would be appreciated.
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
@malvim said in Backup Improvements: Restic Backend:
On a more cloudron-related note, @girish, is there a way people test restoring backups on their cloudrons? How would one go about tesing if restore will run okay when needed?
Yes, use dry-run - https://docs.cloudron.io/backups/#dry-run
For Cloudron 9, we are adding backup integrity checks feature . This will prevent against bitrot and also some sort of self validation that it is backing up all files.