Backup formats for object storage - is any one of them more efficient/quicker than the other?
d19dotca last edited by girish
So I know rsync is generally better for local disk (or external disk) storage, as it's super quick and saves disk space. That is my experience anyways. However when using object storage (which is what I want to move to from an external disk), it seems that it takes longer which is expected since it's over the network, but I'm not sure which may be a bit more efficient in that use-case. Is it rsync as I'd have assumed, or tgz?
If it matters, I have some larger sites (~3 GB) and many smaller ones (~200 MB), and then some apps that take very little storage such as Radicale and Bitwarden, etc. Usually the tgz image is about 12 GB in size, with about 35 GB of disk space used in the Cloudron all together. Any suggestions which one to use?
Has anyone had experience with this themselves with object storage, any of them in particulars seem more efficient than the other? My guess is it's about one and the same, in my own testing so far, but would love to feedback in case there's a more technical advantage to one of them when using object storage. At first I assumed it'd be rsync, but it doesn't seem any faster than tgz, my assumption because it takes rsync quite a while to get the list of what's changed when it has to cross the network (most object storage providers are also quite limited in their data transmission, so usually less than 8 Mbps in my experience with DigitalOcean and OVH), and tgz is uploading a compressed file instead, so in the end they sort of even out. But this is just my very limited testing so far and I'd love to know what others have experienced.
The general take on this is that it depends
The tarball is generally much better for lots of small files or just simply small backups. Especially with object storage this reduces the involved network requests a lot (essentially only few requests are required compared to rsync which requires requests per file within the backup.
Tarball on the other hand is not good for example when having lots of larger files within for example nextcloud. The tarball creation needs a lot of memory and is prone to fail due to that, depending on the available server resources, however rsync especially with hardlinks reduces the required amount of backup storage overall.
jdaviescoates last edited by
@nebulon this is why being able to define backup format per app would be a nice addition.
d19dotca last edited by d19dotca
So in my initial testing yesterday evening... TGZ seems to be the format to use if time is a factor. So for example my full system back up took roughly 20 minutes to the OVH Object Storage storage. However using rsync both the first time and the second time took well over an hour (it was almost 3 hours for the first one but that’s to be expected it’d take longer the first time around). So even though I may be using more disk space with TGZ and thus paying a little bit more I think it’s worth it because there are times where I want to do a full system back up before doing an update or something like that and I don’t want to have to wait an hour or more for that to finish when I want to just get going with the maintenance. My main reason to switch to object storage is I want to not have to worry about space again. Using an external disk was way quicker (just a few minutes using rsync) but much more costly too and also would run into occasional space limitations that’d be annoying to fix.
@d19dotca It's slow not because of the format but because we set a very low concurrency. Specifically, we only make like 10 requests in parallel at a time. So, if you have a lot of files, this can take a while! For AWS S3 alone, we set this concurrency to 500. This is because AWS doesn't even seem to fail but all other providers (especially DO spaces back in the day) used to fail and return 500 all the time.
I will look into this for the next release, it's easy to speed things up.
d19dotca last edited by
@girish Good to know. That option being configurable in the future would be great too, rather than just increasing it in code.