Tar Backups timing out on too large part number
-
@adrw The timeout comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L71 . It's set to 600 seconds. Maybe we can try changing it to 1000 ?
-
@girish I adjusted the timeout for the L71 you mentioned to 10000 and it the task is still getting killed for taking too long. Is there any broader Cloudron task wide timeout or backup configuration for the entire task vs the L71 which seems to be more focused on the individual network call timeout?
Task 1735 timed out Aug 28 10:57:29 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 157 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=167503724700-168577466524 (cloud.alxdr.ca)"} Aug 28 10:58:20 box:tasks 1735: {"percent":71.58823529411765,"message":"Uploaded part 157 - Etag: \"f6e793337c89eb34cb8c62448a4fe6f7\" (cloud.alxdr.ca)"} Aug 28 10:58:20 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 158 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=168577466525-169651208349 (cloud.alxdr.ca)"} Aug 28 10:59:18 box:tasks 1735: {"percent":71.58823529411765,"message":"Uploaded part 158 - Etag: \"81a168929546a77af7262d1d1304854d\" (cloud.alxdr.ca)"} Aug 28 10:59:18 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 159 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=169651208350-170724950174 (cloud.alxdr.ca)"}
-
@adrw said in Tar Backups timing out on too large part number:
Task 1735 timed out
@adrw Something else is going on. The above error message suggests that the backup task took more than 12 hours. Is that the case? Was backup running for more than 12 hours?
-
@adrw In that case, the real issue is https://git.cloudron.io/cloudron/box/-/blob/master/src/backups.js#L1235 . There is a timeout of 12 hours for backup tasks. Can you bump it to like 36 or something?
For copy concurrency, there is already a slider for that. Don't you see it under advanced? Note that where it is failing now is that it has to copy the parts of the same file (basically, the single file is so big that we have to split it up into many parts). I have to check if parallel multi-part copy is allowed. If it is, it's easy to do.
-
@adrw It seems parallel multi-part copy will work per https://dzone.com/articles/amazon-s3-parallel-multipart . Looks like a good change to make.
-
-
@adrw I made the timeout now 24 hours. The timeout is really just there to kill "stuck" backups. This can usually only happen when there is a bug in our code.