Struggling to Replace MinIO - Advice Welcome!
-
Not a replacement for Minio but FYI I backup 240GB to a Hetzner Storage Box using SSHFS and targz and it takes 4-5 hours. I imaging for rsync it'd be much quicker after the first run (I'll soon experiment with creating a 2nd backup site to a Scaleway bucket and will report back...)

-
Not a replacement for Minio but FYI I backup 240GB to a Hetzner Storage Box using SSHFS and targz and it takes 4-5 hours. I imaging for rsync it'd be much quicker after the first run (I'll soon experiment with creating a 2nd backup site to a Scaleway bucket and will report back...)

@jdaviescoates 311.37 GB | 151456 file(s) | 13 app(s) to a Hetzner Storage Box using SSHFS and rsync in 27 Min., 26 Sek. But that wasn't the initial backup.
-
Thanks for the responses. We are particularly interested in de-duplication, does anyone know if Cloudron backing up to a Hetzner Storage Box will do de-duplicated backups? I was surprised when Backblaze didn't, but maybe I configured something wrong?
-
Depending on your appetite for loss, I would consider backups-in-depth. That is, one backup site is not a backup.
- Use
rsync-based backup over SSHFS to Hetzner or similar. You will want to select "use hardlinks" and, if you want it, encryption. The use of hardlinks is, essentially, your de-duplication. (See below.) - For a second layer of depth, I would consider a (daily? weekly? monthly?) backup of your primary backup site to a secondary. This could be a sync to AWS S3, for example. Note that any S3-based backup (B2, Cloudflare ObjectSomething, etc.) will have both a storage cost and an API cost. If you are dealing with millions of small files in your backups, the API costs will become real, because dedupe requires checking each object, and then possibly transferring it (multiple PUT/GET requests per file).
- S3 has the ability to automatically keep multiple versions of a file. You could use this to have an in-place rotation/update of files.
- If you are doing an S3 backup, you can use lifecycle rules to automatically move your S3 content to Glacier. This is much cheaper than "hot" S3 storage. But, you pay a penalty if you download/delete to early/too often.
- As a third, cheap-ish option, go get a 2- or 4-bay NAS that can run TrueNAS, and put a pair of 8-12TB HDDs in it. Configure the disks in a ZFS mirrored pair. Run a
cronjob once per day/week to pull down the contents of the Hetzner box. (Your cron will want to, again, usersyncwith hardlinks.) You now have a local machine mirroring your hot backups. It is arguably more expensive than some other options (~600USD up front), but you don't have any "we might run out of space" issues. And, because you're using it to pull, you don't have any weird networking problems: just SCP the data down. (Or,rsyncit down over SSH.)
Whatever you are doing, consider targeting two different destinations at two different times (per day/alternating/etc.). Or, consider having some combination of backups that give you multiple copies at multiple sites. That could be Hetzner in two regions, with backups run on alternating days, or it could be you backup to a storage box and pull down a clone every day to a local NAS, or ... or ...
Ultimately, your 150GB is small. If you're increasing by a few GB per week, you're saying that you are likely to have 1TB/year. Not knowing your company's finances, this is generally considered a small amount of data. Trying to optimize for cost, immediately, is possibly less important than just getting the backups somewhere.
Other strategies could involve backing up to the NAS locally first, and then using a
crontoborgorrsyncto a remote host (possibly more annoying to set up), etc. But, you might have more "dedupe" options then. (borghas dedupe built in, I think, but...)I have a suspicion that your desire to use object storage might be a red herring. But, again, I don't know your constraints/budget/needs/concerns.
Deduplication: If you use
rsyncwith hardlinks, then each daily backup will automatically dedupe unchanged files. A hardlink is a pointer to a file. So, if you uploadsuper_ai_outputs_day_1.mdto your storage on Monday, and it remains unchanged for the rest of time, then each subsequent day is going to be a hardlink to that file. It will, for all intents and purposes, take up zero disk space. So, if you are backing up large numbers of small-to-medium sized files that do not change, SSHFS/rsync with hardlinks is going to naturally dedupe your unchanging old data.This will not do binary deduplication of different files. So, if you're looking for a backup solution that would (say) identify that two, 1GB files where the middle 500GB are identical, and somehow dedupe that... you need more sophisticated tools and strategies. Rsync/hardlinks just makes sure that the same file, backed up every day, does not take (# days * size) space. It just takes the original size of the file plus an inode in the FS for each link.
Note, though, if you involve a snapshot of your hardlinked backups to an object store, every file may take the full size of every file for every day. I'm possibly wrong on that, but I'm not confident that most tools would know what to do with those hardlinks when you're copying to an object store. I think you'd end up multiplying your disk usage significantly, because your backup tool will have to create a copy of each file into the object store. (Most object stores do not have a notion of symlinks/hardlinks.) An experiment with a subset of the data, or even a few files, will tell you the answer to that question.
If you have other questions, you can ask here, or DM me.
- Use
-
Minio is back:
https://blog.vonng.com/en/db/minio-resurrect/
I see on the HN conversation people are doubting the author, and some others also mention Chainguard will also keep a fork with CVE patched: https://github.com/chainguard-forks/minio
Maybe it's worth waiting a bit to see which fork get consistent maintenance.
-
I am in a similar position. I currently use iDrive e2 for backups and it’s fine but it does take around 1.5 hours uploading tarballs from my server. I’m looking at possibly deploying a low-budget Kimsufi server in the same OVH data centre and just mounting that disk as SSHFS to Cloudron on my primary server, haven’t tried it out yet. If I go this way I will likely still keep iDrive as a second backup destination and just run it a little less frequently and with lower retention to save on costs a little bit there.
I’m wondering about MinIO alternatives as I tried MinIO on a second Cloudron install but it seemed to take even longer than uploading to iDrive e2 somehow (I expected it’d be quicker not slower). It seemed the project is dead too but then it also looks like there’s an active fork that maybe the Cloudron @staff can look into using instead. Brings back many of the lost MinIO features by the sounds of it too.
Thinking of other avenues to keep backups more “local” or as close to local as possible for rapid quick backups, and then completely offsite as a second backup plan too.
I have around 65 GB compressed to back up, around 125 GB uncompressed, I believe.
-
I haven’t used Garage yet but isn’t it just another s3? So it’d basically be a MinIO replacement, right? Do we have any other options for ones with “hardlinks” using rsync? I kind of think the Surfer app would honestly be a great way to use as a backup somehow if it could be used to expose a disk.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login