Performance of Object Storage, any insights?
-
I used to be an Object Storage expert and spent time working on the technology at IBM, where I wrote a Redbook on it.
You need to understand that you cannot treat objects like files.
File access patterns are designed for disks with sectors and blocks; 4k bytes.
Object access patterns are designed for large chunks; 100s of MBs to TBs & PBs at times. Read, they don't like a lot of small transactions.
I've been using Scaleway for backups until they screwed up their systems in FR.
Then added the IBM Cloud Object Storage (COS) system to the mix and it is much more performant.See the other threads about rclone and mounting remote object stores.
-
@d19dotca no, it's fine for backups, but to fully optimize it, we need to treat it like object storage, not file storage.
That requires a rethink on how we handle, bundle and send out backup files.
For example, avoid rsync mode, tons of small writes. tgz is better, but we can do even better.
-
One thing is when an app is backing up, the app itself is not down. It's still accessible and usable by users. So, even if it takes a bit long, I understand it's annoying but atleast there is no real downtime. Only when the container is re-created (which usually takes under a minute) is there a downtime.
To answer the post itself, best to choose an object storage provider/region which is geographically close to the server. Other than that, I have not done any specific performance measurements.
-
@girish Fair enough. I’m trying with Vultr Object Storage at the moment (just started some tests today) and it’s performance is a smidge better than OVH so far with the same general settings, getting around 8-15 Mbps for most of the app updates, but one thing I noticed behaviour for both OVH and Vultr (so I assume others too)… when it gets to the box data (emails) it slows to a crawl and ranges from just 1-3 Mbps, which when uploading about 25 GB of box data makes this an incredibly lengthy process. Any ideas why that might be consistently dropping when Cloudron gets to the box data?
-
@d19dotca said in Performance of Object Storage, any insights?:
when it gets to the box data (emails) it slows to a crawl
If the backup is processing emails individually, that would make sense. Too many small files, makes the object store communication too chatty slowing down actual data transfer.
If it's a bigger one lump tgz it will be faster.
-
@robi I've always been using the tgz format because it always seemed more performant. I get the same behaviour though pretty much no matter what format I use, I noticed, as I just tried with rsync and basically the same thing in the logs showing the transfer speed really slow during the boxdata backup portion, but like 3 times faster on the other portions of the backup.
Funny enough, in my experience rsync was always really bad for Object Storage doing extensive testing with OVH Object Storage. I tried rsync just a bit ago today, the first full one and then a copy. The first full one took a long time (about 1.25 hours), but the subsequent one took only about 17 minutes which I was super impressed with. I never once got that behaviour in OVH where the second rsync time, it was always well over an hour no matter what. Perhaps Vultr's Object Storage is better though with the copy commands or something than OVH was.
-
@robi Oh yeah I get that, that's how rsync works, but I mean at OVH I never got the "rsync benefit" of differential backups when using it, and my server isn't busy enough to have thousands of files changed in a few hours, lol, but yet it seems to be working where I can see the benefit of rsync when using Vultr's Object Storage. I know that OVH uses OpenStack in the backend which also used something it auto-created called "+segments" folders, so it could have been some weird compatibility thing between rsync and OVH's Object Storage / OpenStack.