Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Discuss
  3. Backup formats for object storage - is any one of them more efficient/quicker than the other?

Backup formats for object storage - is any one of them more efficient/quicker than the other?

Scheduled Pinned Locked Moved Discuss
backupsrsynctgz
9 Posts 4 Posters 1.6k Views 5 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • d19dotcaD Offline
    d19dotcaD Offline
    d19dotca
    wrote on last edited by girish
    #1

    So I know rsync is generally better for local disk (or external disk) storage, as it's super quick and saves disk space. That is my experience anyways. However when using object storage (which is what I want to move to from an external disk), it seems that it takes longer which is expected since it's over the network, but I'm not sure which may be a bit more efficient in that use-case. Is it rsync as I'd have assumed, or tgz?

    If it matters, I have some larger sites (~3 GB) and many smaller ones (~200 MB), and then some apps that take very little storage such as Radicale and Bitwarden, etc. Usually the tgz image is about 12 GB in size, with about 35 GB of disk space used in the Cloudron all together. Any suggestions which one to use?

    Has anyone had experience with this themselves with object storage, any of them in particulars seem more efficient than the other? My guess is it's about one and the same, in my own testing so far, but would love to feedback in case there's a more technical advantage to one of them when using object storage. At first I assumed it'd be rsync, but it doesn't seem any faster than tgz, my assumption because it takes rsync quite a while to get the list of what's changed when it has to cross the network (most object storage providers are also quite limited in their data transmission, so usually less than 8 Mbps in my experience with DigitalOcean and OVH), and tgz is uploading a compressed file instead, so in the end they sort of even out. But this is just my very limited testing so far and I'd love to know what others have experienced.

    --
    Dustin Dauncey
    www.d19.ca

    1 Reply Last reply
    1
    • nebulonN Offline
      nebulonN Offline
      nebulon
      Staff
      wrote on last edited by
      #2

      The general take on this is that it depends 😉
      The tarball is generally much better for lots of small files or just simply small backups. Especially with object storage this reduces the involved network requests a lot (essentially only few requests are required compared to rsync which requires requests per file within the backup.

      Tarball on the other hand is not good for example when having lots of larger files within for example nextcloud. The tarball creation needs a lot of memory and is prone to fail due to that, depending on the available server resources, however rsync especially with hardlinks reduces the required amount of backup storage overall.

      jdaviescoatesJ 1 Reply Last reply
      1
      • nebulonN nebulon

        The general take on this is that it depends 😉
        The tarball is generally much better for lots of small files or just simply small backups. Especially with object storage this reduces the involved network requests a lot (essentially only few requests are required compared to rsync which requires requests per file within the backup.

        Tarball on the other hand is not good for example when having lots of larger files within for example nextcloud. The tarball creation needs a lot of memory and is prone to fail due to that, depending on the available server resources, however rsync especially with hardlinks reduces the required amount of backup storage overall.

        jdaviescoatesJ Online
        jdaviescoatesJ Online
        jdaviescoates
        wrote on last edited by
        #3

        @nebulon this is why being able to define backup format per app would be a nice addition.

        I use Cloudron with Gandi & Hetzner

        1 Reply Last reply
        3
        • d19dotcaD Offline
          d19dotcaD Offline
          d19dotca
          wrote on last edited by d19dotca
          #4

          So in my initial testing yesterday evening... TGZ seems to be the format to use if time is a factor. So for example my full system back up took roughly 20 minutes to the OVH Object Storage storage. However using rsync both the first time and the second time took well over an hour (it was almost 3 hours for the first one but that’s to be expected it’d take longer the first time around). So even though I may be using more disk space with TGZ and thus paying a little bit more I think it’s worth it because there are times where I want to do a full system back up before doing an update or something like that and I don’t want to have to wait an hour or more for that to finish when I want to just get going with the maintenance. My main reason to switch to object storage is I want to not have to worry about space again. Using an external disk was way quicker (just a few minutes using rsync) but much more costly too and also would run into occasional space limitations that’d be annoying to fix.

          --
          Dustin Dauncey
          www.d19.ca

          1 Reply Last reply
          0
          • girishG Offline
            girishG Offline
            girish
            Staff
            wrote on last edited by girish
            #5

            @d19dotca It's slow not because of the format but because we set a very low concurrency. Specifically, we only make like 10 requests in parallel at a time. So, if you have a lot of files, this can take a while! For AWS S3 alone, we set this concurrency to 500. This is because AWS doesn't even seem to fail but all other providers (especially DO spaces back in the day) used to fail and return 500 all the time.

            I will look into this for the next release, it's easy to speed things up.

            d19dotcaD 2 Replies Last reply
            2
            • girishG girish

              @d19dotca It's slow not because of the format but because we set a very low concurrency. Specifically, we only make like 10 requests in parallel at a time. So, if you have a lot of files, this can take a while! For AWS S3 alone, we set this concurrency to 500. This is because AWS doesn't even seem to fail but all other providers (especially DO spaces back in the day) used to fail and return 500 all the time.

              I will look into this for the next release, it's easy to speed things up.

              d19dotcaD Offline
              d19dotcaD Offline
              d19dotca
              wrote on last edited by
              #6

              @girish Good to know. That option being configurable in the future would be great too, rather than just increasing it in code.

              --
              Dustin Dauncey
              www.d19.ca

              1 Reply Last reply
              0
              • girishG girish

                @d19dotca It's slow not because of the format but because we set a very low concurrency. Specifically, we only make like 10 requests in parallel at a time. So, if you have a lot of files, this can take a while! For AWS S3 alone, we set this concurrency to 500. This is because AWS doesn't even seem to fail but all other providers (especially DO spaces back in the day) used to fail and return 500 all the time.

                I will look into this for the next release, it's easy to speed things up.

                d19dotcaD Offline
                d19dotcaD Offline
                d19dotca
                wrote on last edited by d19dotca
                #7

                @girish I can open a new thread if you'd like, but figured I'd quickly tack this onto this post here... I found a KB article on OVH for an "optimized method for uploading" to their Object Storage to get better performance. Was wondering if this is something that needs to be incorporated into the Cloudron backup package somehow for improvements in performance?

                https://docs.ovh.com/ca/en/storage/optimised_method_for_uploading_files_to_object_storage/

                32c0eec1-fcb4-4b4b-90ab-408877970467-image.png

                Ultimately I'm trying to see how I can improve my backup speeds, if it's even possible. Right now using tgz format to OVH Object Storage takes me about 33 minutes. For what it's worth, the snapshot folder is ~15 GB in size (compressed I guess since tgz is used).

                --
                Dustin Dauncey
                www.d19.ca

                girishG 1 Reply Last reply
                0
                • d19dotcaD d19dotca

                  @girish I can open a new thread if you'd like, but figured I'd quickly tack this onto this post here... I found a KB article on OVH for an "optimized method for uploading" to their Object Storage to get better performance. Was wondering if this is something that needs to be incorporated into the Cloudron backup package somehow for improvements in performance?

                  https://docs.ovh.com/ca/en/storage/optimised_method_for_uploading_files_to_object_storage/

                  32c0eec1-fcb4-4b4b-90ab-408877970467-image.png

                  Ultimately I'm trying to see how I can improve my backup speeds, if it's even possible. Right now using tgz format to OVH Object Storage takes me about 33 minutes. For what it's worth, the snapshot folder is ~15 GB in size (compressed I guess since tgz is used).

                  girishG Offline
                  girishG Offline
                  girish
                  Staff
                  wrote on last edited by
                  #8

                  @d19dotca said in Backup formats for object storage - is any one of them more efficient/quicker than the other?:

                  Ultimately I'm trying to see how I can improve my backup speeds, if it's even possible. Right now using tgz format to OVH Object Storage takes me about 33 minutes. For what it's worth, the snapshot folder is ~15 GB in size (compressed I guess since tgz is used).

                  The tgz in Cloudron is created as a "stream" and is not a real file in the file system (because if we made a separate file, then we will consume the compressed size of 15GB in your case extra in the file system, atleast temporarily during the time of backup). Because it's a stream created on the fly, we cannot do "parallel" upload of this file by breaking it up into parts. Speed optimizations can be done in rsync mode though, some of the settings are exposed in Advanced section of backups, like the concurrently etc. Finally, I think segment* is swift specific, I think. We use S3 APIs, usually they are called chunk/part size in those APIs.

                  d19dotcaD 1 Reply Last reply
                  2
                  • girishG girish

                    @d19dotca said in Backup formats for object storage - is any one of them more efficient/quicker than the other?:

                    Ultimately I'm trying to see how I can improve my backup speeds, if it's even possible. Right now using tgz format to OVH Object Storage takes me about 33 minutes. For what it's worth, the snapshot folder is ~15 GB in size (compressed I guess since tgz is used).

                    The tgz in Cloudron is created as a "stream" and is not a real file in the file system (because if we made a separate file, then we will consume the compressed size of 15GB in your case extra in the file system, atleast temporarily during the time of backup). Because it's a stream created on the fly, we cannot do "parallel" upload of this file by breaking it up into parts. Speed optimizations can be done in rsync mode though, some of the settings are exposed in Advanced section of backups, like the concurrently etc. Finally, I think segment* is swift specific, I think. We use S3 APIs, usually they are called chunk/part size in those APIs.

                    d19dotcaD Offline
                    d19dotcaD Offline
                    d19dotca
                    wrote on last edited by
                    #9

                    @girish Ah okay, that makes sense on the tgz part. Unfortunately I haven't had much success at all in any quicker times (in fact it's often closer to double the time than tgz, taking between 1-2 hours) when using rsync in my tests for OVH Object Storage. But maybe I just haven't found that 'sweet spot' yet. I'll keep testing. Thanks Girish.

                    --
                    Dustin Dauncey
                    www.d19.ca

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • Bookmarks
                    • Search