Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Tar Backups timing out on too large part number

Tar Backups timing out on too large part number

Scheduled Pinned Locked Moved Support
backblazebackupsfeature-request
28 Posts 3 Posters 3.9k Views 3 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • girishG Offline
    girishG Offline
    girish
    Staff
    wrote on last edited by
    #9

    @adrw It looks like the upload succeeded! What has failed is the copying part number (which also has a part size). This is hardcoded to 5GB. It seems B2 takes more than 300 seconds to copy a 5GB file. Can you please adjust this line - https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L258

                const largeFileLimit = apiConfig.provider === 'exoscale-sos' ? 1024 * 1024 * 1024 : 5 * 1024 * 1024 * 1024;
    

    to

                const largeFileLimit = apiConfig.provider === 'exoscale-sos' ? 1024 * 1024 * 1024 : 1 * 1024 * 1024 * 1024;
    

    So we are reducing part size to 1GB above. I will make copy part size configurable as well then.

    A 1 Reply Last reply
    1
    • girishG girish

      @adrw It looks like the upload succeeded! What has failed is the copying part number (which also has a part size). This is hardcoded to 5GB. It seems B2 takes more than 300 seconds to copy a 5GB file. Can you please adjust this line - https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L258

                  const largeFileLimit = apiConfig.provider === 'exoscale-sos' ? 1024 * 1024 * 1024 : 5 * 1024 * 1024 * 1024;
      

      to

                  const largeFileLimit = apiConfig.provider === 'exoscale-sos' ? 1024 * 1024 * 1024 : 1 * 1024 * 1024 * 1024;
      

      So we are reducing part size to 1GB above. I will make copy part size configurable as well then.

      A Offline
      A Offline
      adrw
      wrote on last edited by
      #10

      @girish Awesome! Going to give that a shot and report back. Thanks again for your help, very hyped to get back to using tar backups and cutting my B2 ingress costs.

      1 Reply Last reply
      0
      • girishG Offline
        girishG Offline
        girish
        Staff
        wrote on last edited by
        #11

        @adrw It looks like my suggestion was wrong. We always only do 1GB copies, I read the code wrong. Re-reading the logs it seems that after around 90 parts, the copies start failing (so ~90GB). It seems the server returned a 500. We might have to contact b2 and ask them if they have size restrictions on file copies.

        A 1 Reply Last reply
        0
        • girishG girish

          @adrw It looks like my suggestion was wrong. We always only do 1GB copies, I read the code wrong. Re-reading the logs it seems that after around 90 parts, the copies start failing (so ~90GB). It seems the server returned a 500. We might have to contact b2 and ask them if they have size restrictions on file copies.

          A Offline
          A Offline
          adrw
          wrote on last edited by adrw
          #12

          @girish Thanks, it did fail though it seem to succeed up to 146 parts and then the logs start so I'm assuming a timeout of the backup job.

          Would it be possible to extend the timeout on the Cloudron backup task?

          ...
          
          2020-08-21T01:08:00.405Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploading backup 261505M@8MBps (cloud.alxdr.ca)"}
          2020-08-21T01:08:10.403Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploading backup 261566M@8MBps (cloud.alxdr.ca)"}
          2020-08-21T01:08:20.924Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploading backup 261631M@8MBps (cloud.alxdr.ca)"}
          2020-08-21T01:08:30.844Z box:shell backup-snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a (stdout): 2020-08-21T01:08:30.844Z box:storage/s3 Uploaded snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc: {"Location":"/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc","Bucket":"my-adrw-xyz","Key":"snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc","ETag":"26a95d5fe8b46062247614c72801c523-2617"}
          
          2020-08-21T01:08:30.844Z box:shell backup-snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a (stdout): 2020-08-21T01:08:30.844Z box:backupupload upload completed. error:  null
          
          2020-08-21T01:08:30.844Z box:backups runBackupUpload: result - {"result":""}
          2020-08-21T01:08:31.020Z box:backups cloud.alxdr.ca uploadAppSnapshot: snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a done. 32181.605 seconds
          2020-08-21T01:08:31.043Z box:backups Rotating app backup of c4dd3c6d-806e-454b-beba-4cdd3f29865a to id 2020-08-20-160520-230/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a_2020-08-21-010831-020_v4.6.2
          2020-08-21T01:08:32.129Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying 0-1. 0 errors so far. concurrency set to 10 (cloud.alxdr.ca)"}
          2020-08-21T01:08:32.130Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying (multipart) snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc (cloud.alxdr.ca)"}
          2020-08-21T01:08:32.505Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 1 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=0-1073741824 (cloud.alxdr.ca)"}
          2020-08-21T01:09:31.700Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 1 - Etag: \"d2ad221b465e2361c366950c1eefb51c\" (cloud.alxdr.ca)"}
          2020-08-21T01:09:31.700Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 2 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=1073741825-2147483649 (cloud.alxdr.ca)"}
          2020-08-21T01:10:23.055Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 2 - Etag: \"34af51edde0b3805ac418e080a860660\" (cloud.alxdr.ca)"}
          2020-08-21T01:10:23.055Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 3 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=2147483650-3221225474 (cloud.alxdr.ca)"}
          2020-08-21T01:13:38.471Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 3 - Etag: \"9c2fa7630ffb42e4879fa3454c52ba63\" (cloud.alxdr.ca)"}
          2020-08-21T01:13:38.471Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 4 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=3221225475-4294967299 (cloud.alxdr.ca)"}
          
          ...
          
          2020-08-21T04:02:05.786Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 143 - Etag: \"c2379cf098ff6e944931add463449190\" (cloud.alxdr.ca)"}
          2020-08-21T04:02:05.786Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 144 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=153545080975-154618822799 (cloud.alxdr.ca)"}
          2020-08-21T04:03:08.359Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 144 - Etag: \"d163000653b34764f469adf7ef1038cf\" (cloud.alxdr.ca)"}
          2020-08-21T04:03:08.359Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 145 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=154618822800-155692564624 (cloud.alxdr.ca)"}
          2020-08-21T04:04:20.333Z box:tasks 1680: {"percent":71.58823529411765,"message":"Uploaded part 145 - Etag: \"4b38a4e475683fefcb35e3d20af5d93d\" (cloud.alxdr.ca)"}
          2020-08-21T04:04:20.333Z box:tasks 1680: {"percent":71.58823529411765,"message":"Copying part 146 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=155692564625-156766306449 (cloud.alxdr.ca)"}
          
          1 Reply Last reply
          0
          • girishG Offline
            girishG Offline
            girish
            Staff
            wrote on last edited by
            #13

            @adrw The timeout comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L71 . It's set to 600 seconds. Maybe we can try changing it to 1000 ?

            A 3 Replies Last reply
            0
            • A Offline
              A Offline
              adrw
              wrote on last edited by
              #14

              Thanks @girish ! Going to try that out, mine was previously set to 300, so 1000 should hopefully be a sufficient allowance. Will report back.

              1 Reply Last reply
              0
              • girishG girish

                @adrw The timeout comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L71 . It's set to 600 seconds. Maybe we can try changing it to 1000 ?

                A Offline
                A Offline
                adrw
                wrote on last edited by adrw
                #15

                @girish I adjusted the timeout for the L71 you mentioned to 10000 and it the task is still getting killed for taking too long. Is there any broader Cloudron task wide timeout or backup configuration for the entire task vs the L71 which seems to be more focused on the individual network call timeout?

                Task 1735 timed out
                
                Aug 28 10:57:29 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 157 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=167503724700-168577466524 (cloud.alxdr.ca)"}
                Aug 28 10:58:20 box:tasks 1735: {"percent":71.58823529411765,"message":"Uploaded part 157 - Etag: \"f6e793337c89eb34cb8c62448a4fe6f7\" (cloud.alxdr.ca)"}
                Aug 28 10:58:20 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 158 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=168577466525-169651208349 (cloud.alxdr.ca)"}
                Aug 28 10:59:18 box:tasks 1735: {"percent":71.58823529411765,"message":"Uploaded part 158 - Etag: \"81a168929546a77af7262d1d1304854d\" (cloud.alxdr.ca)"}
                Aug 28 10:59:18 box:tasks 1735: {"percent":71.58823529411765,"message":"Copying part 159 - /my-adrw-xyz/snapshot/app_c4dd3c6d-806e-454b-beba-4cdd3f29865a.tar.gz.enc bytes=169651208350-170724950174 (cloud.alxdr.ca)"}
                
                1 Reply Last reply
                0
                • girishG girish

                  @adrw The timeout comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L71 . It's set to 600 seconds. Maybe we can try changing it to 1000 ?

                  A Offline
                  A Offline
                  adrw
                  wrote on last edited by
                  #16

                  @girish Any ideas on how to proceed given the continued timeouts?

                  1 Reply Last reply
                  0
                  • girishG girish

                    @adrw The timeout comes from https://git.cloudron.io/cloudron/box/-/blob/master/src/storage/s3.js#L71 . It's set to 600 seconds. Maybe we can try changing it to 1000 ?

                    A Offline
                    A Offline
                    adrw
                    wrote on last edited by
                    #17

                    @girish Alternatively, can we add configurability of parallel copy threads similar to what was added to the rsync advanced configuration options? If we can increase parallel copying than I think this tar backup task will succeed.

                    1 Reply Last reply
                    0
                    • girishG Offline
                      girishG Offline
                      girish
                      Staff
                      wrote on last edited by
                      #18

                      @adrw said in Tar Backups timing out on too large part number:

                      Task 1735 timed out

                      @adrw Something else is going on. The above error message suggests that the backup task took more than 12 hours. Is that the case? Was backup running for more than 12 hours?

                      A 1 Reply Last reply
                      0
                      • girishG girish

                        @adrw said in Tar Backups timing out on too large part number:

                        Task 1735 timed out

                        @adrw Something else is going on. The above error message suggests that the backup task took more than 12 hours. Is that the case? Was backup running for more than 12 hours?

                        A Offline
                        A Offline
                        adrw
                        wrote on last edited by
                        #19

                        @girish Yes, pretty sure it was.

                        1 Reply Last reply
                        0
                        • girishG Offline
                          girishG Offline
                          girish
                          Staff
                          wrote on last edited by
                          #20

                          @adrw In that case, the real issue is https://git.cloudron.io/cloudron/box/-/blob/master/src/backups.js#L1235 . There is a timeout of 12 hours for backup tasks. Can you bump it to like 36 or something?

                          For copy concurrency, there is already a slider for that. Don't you see it under advanced? Note that where it is failing now is that it has to copy the parts of the same file (basically, the single file is so big that we have to split it up into many parts). I have to check if parallel multi-part copy is allowed. If it is, it's easy to do.

                          A 3 Replies Last reply
                          0
                          • girishG Offline
                            girishG Offline
                            girish
                            Staff
                            wrote on last edited by
                            #21

                            @adrw It seems parallel multi-part copy will work per https://dzone.com/articles/amazon-s3-parallel-multipart . Looks like a good change to make.

                            1 Reply Last reply
                            1
                            • girishG girish

                              @adrw In that case, the real issue is https://git.cloudron.io/cloudron/box/-/blob/master/src/backups.js#L1235 . There is a timeout of 12 hours for backup tasks. Can you bump it to like 36 or something?

                              For copy concurrency, there is already a slider for that. Don't you see it under advanced? Note that where it is failing now is that it has to copy the parts of the same file (basically, the single file is so big that we have to split it up into many parts). I have to check if parallel multi-part copy is allowed. If it is, it's easy to do.

                              A Offline
                              A Offline
                              adrw
                              wrote on last edited by
                              #22

                              @girish The only advanced features I see for the tar backups is the memory, none of the parallel copy or upload/download like I see when I'm in rsync mode.

                              girishG 1 Reply Last reply
                              0
                              • girishG girish

                                @adrw In that case, the real issue is https://git.cloudron.io/cloudron/box/-/blob/master/src/backups.js#L1235 . There is a timeout of 12 hours for backup tasks. Can you bump it to like 36 or something?

                                For copy concurrency, there is already a slider for that. Don't you see it under advanced? Note that where it is failing now is that it has to copy the parts of the same file (basically, the single file is so big that we have to split it up into many parts). I have to check if parallel multi-part copy is allowed. If it is, it's easy to do.

                                A Offline
                                A Offline
                                adrw
                                wrote on last edited by
                                #23

                                @girish Thanks! That's what I was looking for, I'll give that a shot.

                                1 Reply Last reply
                                0
                                • A adrw

                                  @girish The only advanced features I see for the tar backups is the memory, none of the parallel copy or upload/download like I see when I'm in rsync mode.

                                  girishG Offline
                                  girishG Offline
                                  girish
                                  Staff
                                  wrote on last edited by
                                  #24

                                  @adrw Ah indeed. When doing tar backup, there is only one file to upload and copy. Nothing to do in parallel (apart from the multi-part copy which I think can be hardcoded).

                                  A 1 Reply Last reply
                                  0
                                  • girishG girish

                                    @adrw Ah indeed. When doing tar backup, there is only one file to upload and copy. Nothing to do in parallel (apart from the multi-part copy which I think can be hardcoded).

                                    A Offline
                                    A Offline
                                    adrw
                                    wrote on last edited by
                                    #25

                                    @girish Could the copy that happens at the end of the task (I think from the snapshot folder to the timestamped one) be done in parallel? It seems to be done serially right now which contributes to the longer task time to some extent.

                                    1 Reply Last reply
                                    0
                                    • girishG girish

                                      @adrw In that case, the real issue is https://git.cloudron.io/cloudron/box/-/blob/master/src/backups.js#L1235 . There is a timeout of 12 hours for backup tasks. Can you bump it to like 36 or something?

                                      For copy concurrency, there is already a slider for that. Don't you see it under advanced? Note that where it is failing now is that it has to copy the parts of the same file (basically, the single file is so big that we have to split it up into many parts). I have to check if parallel multi-part copy is allowed. If it is, it's easy to do.

                                      A Offline
                                      A Offline
                                      adrw
                                      wrote on last edited by
                                      #26

                                      @girish Could this timeout be configurable in a future release?

                                      1 Reply Last reply
                                      0
                                      • girishG Offline
                                        girishG Offline
                                        girish
                                        Staff
                                        wrote on last edited by
                                        #27

                                        @adrw I made the timeout now 24 hours. The timeout is really just there to kill "stuck" backups. This can usually only happen when there is a bug in our code.

                                        A 1 Reply Last reply
                                        1
                                        • girishG girish

                                          @adrw I made the timeout now 24 hours. The timeout is really just there to kill "stuck" backups. This can usually only happen when there is a bug in our code.

                                          A Offline
                                          A Offline
                                          adrw
                                          wrote on last edited by
                                          #28

                                          @girish Great! Thanks again for your help debugging this and adding more configuration. Huge help for larger backups like mine.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • Bookmarks
                                          • Search