Backup failure retry wait times should be shortened from 4 hours
-
I've noticed that backup failures trigger a retry 4 hours later. This "4 hours" retry wait time isn't customizable from the looks of it, and (maybe it's just me that has this opinion but) it seems to be quite a long time to wait for a retry.
I'd like to request either of the following be changed or added to Cloudron:
- Customizable option for retry wait time
- Shorten it to somewhere between 30 minutes and 2 hours at most
Unless space is an issue and there maybe needs to be some additional space added which can maybe take some time before an admin can get to it, I can't quite think of a reason why it needs to wait 4 hours for the next one.
This thought has always been on my mind, but it's more apparent to me now that 5.5.0 allows us to customize the backup schedule. In my example, it's backing up every 6 hours (4 times a day total), and so retrying the backup 4 hours afterwards means there's then just a 2 hour gap between that and the next one (if the retry succeeds), which seems a little bit pointless to have auto backups just two hours apart.
Hopefully the above makes sense, but please let me know if I can clarify at all.
-
I've noticed that backup failures trigger a retry 4 hours later. This "4 hours" retry wait time isn't customizable from the looks of it, and (maybe it's just me that has this opinion but) it seems to be quite a long time to wait for a retry.
I'd like to request either of the following be changed or added to Cloudron:
- Customizable option for retry wait time
- Shorten it to somewhere between 30 minutes and 2 hours at most
Unless space is an issue and there maybe needs to be some additional space added which can maybe take some time before an admin can get to it, I can't quite think of a reason why it needs to wait 4 hours for the next one.
This thought has always been on my mind, but it's more apparent to me now that 5.5.0 allows us to customize the backup schedule. In my example, it's backing up every 6 hours (4 times a day total), and so retrying the backup 4 hours afterwards means there's then just a 2 hour gap between that and the next one (if the retry succeeds), which seems a little bit pointless to have auto backups just two hours apart.
Hopefully the above makes sense, but please let me know if I can clarify at all.
@d19dotca in 5.5, we have removed the retry logic entirely. If the backup fails , it gives a notification and only backs up next time based on the schedule. The retry logic doesn't work well because then it might conflict with work hours again and it was the whole reason to make the backup schedule configurable in the first place. I think for general network errors there is already separate retry logic in place independent of this. For any other convoluted errors like mount point not found, the retry wont help.
-
@d19dotca in 5.5, we have removed the retry logic entirely. If the backup fails , it gives a notification and only backs up next time based on the schedule. The retry logic doesn't work well because then it might conflict with work hours again and it was the whole reason to make the backup schedule configurable in the first place. I think for general network errors there is already separate retry logic in place independent of this. For any other convoluted errors like mount point not found, the retry wont help.
@girish That's good to hear, but if that's true then I think there may be just a small defect because the notification I receive in Cloudron when it fails still has the line about retrying in 4 hours. Not the email, but the actual notification inside of Cloudron.
-
@d19dotca Ah, you are right. I forgot to fix that!
There are some more changes to the backup system (just putting it here, since it's not obvious):
- The backup is now run with a nice of 15. This makes sure that it gets low priority if the Cloudron is doing other things.
- It's run with a configurable memory limit. This memory limit is in Advanced -> Settings. This is useful if you want to do faster upload (by increasing concurrency values). This necessarily means you have to give the task more memory.
- There is currently a timeout of 12 hours for the task. If people are hitting this limit, I will bump this up.
-
@d19dotca Ah, you are right. I forgot to fix that!
There are some more changes to the backup system (just putting it here, since it's not obvious):
- The backup is now run with a nice of 15. This makes sure that it gets low priority if the Cloudron is doing other things.
- It's run with a configurable memory limit. This memory limit is in Advanced -> Settings. This is useful if you want to do faster upload (by increasing concurrency values). This necessarily means you have to give the task more memory.
- There is currently a timeout of 12 hours for the task. If people are hitting this limit, I will bump this up.
@girish This needs to be added to the docs too on the concurrency limits and such. Just went to reference it to set appropriate values and didnβt find it. For what itβs worth, I tried with the defaults (10 concurrency) on all three of those settings, and eventually bumped it to 100. But I didnβt notice any real difference. I guess those settings apply more to remote systems instead of ext4 mounted disks?
-
@d19dotca Ah, you are right. I forgot to fix that!
There are some more changes to the backup system (just putting it here, since it's not obvious):
- The backup is now run with a nice of 15. This makes sure that it gets low priority if the Cloudron is doing other things.
- It's run with a configurable memory limit. This memory limit is in Advanced -> Settings. This is useful if you want to do faster upload (by increasing concurrency values). This necessarily means you have to give the task more memory.
- There is currently a timeout of 12 hours for the task. If people are hitting this limit, I will bump this up.
@girish said in Backup failure retry wait times should be shortened from 4 hours:
by increasing concurrency values
Where can I find these settings? I only see the memory limit setting (using S3)
-
The backup changes worked out very well for us!!!
Cloudron #1 (5.4.1) was 35 minutes for about 14GB (tgz) and now (5.5) 3 minutes with highest concurrency limits / memory in rsync to external Minio.
Cloudron #2 (5.4.1) was about 1 hour for about 38GB (tgz) and now (5.5) 36 minutes with highest concurrency limits / memory in rsync to external Minio.
Thanks @girish for this wonderfull and welcome update (also for being able to set own times and intervals)!
-
@d19dotca I have added some docs now.
https://cloudron.io/documentation/backups/#schedule has info on the timeout and nice.
https://cloudron.io/documentation/backups/#concurrency-settings on the concurrency settings.