Struggling to Replace MinIO - Advice Welcome!

jdaviescoates

Not a replacement for Minio but FYI I backup 240GB to a Hetzner Storage Box using SSHFS and targz and it takes 4-5 hours. I imaging for rsync it'd be much quicker after the first run (I'll soon experiment with creating a 2nd backup site to a Scaleway bucket and will report back...)

luckow

@jdaviescoates 311.37 GB | 151456 file(s) | 13 app(s) to a Hetzner Storage Box using SSHFS and rsync in 27 Min., 26 Sek. But that wasn't the initial backup.

robi

There is an effort to explore minio alternatives in another thread.
Sponsorship may speed things along.

davejgreen

Thanks for the responses. We are particularly interested in de-duplication, does anyone know if Cloudron backing up to a Hetzner Storage Box will do de-duplicated backups? I was surprised when Backblaze didn't, but maybe I configured something wrong?

jadudm

Depending on your appetite for loss, I would consider backups-in-depth. That is, one backup site is not a backup.

Use rsync-based backup over SSHFS to Hetzner or similar. You will want to select "use hardlinks" and, if you want it, encryption. The use of hardlinks is, essentially, your de-duplication. (See below.)
For a second layer of depth, I would consider a (daily? weekly? monthly?) backup of your primary backup site to a secondary. This could be a sync to AWS S3, for example. Note that any S3-based backup (B2, Cloudflare ObjectSomething, etc.) will have both a storage cost and an API cost. If you are dealing with millions of small files in your backups, the API costs will become real, because dedupe requires checking each object, and then possibly transferring it (multiple PUT/GET requests per file).
1. S3 has the ability to automatically keep multiple versions of a file. You could use this to have an in-place rotation/update of files.
2. If you are doing an S3 backup, you can use lifecycle rules to automatically move your S3 content to Glacier. This is much cheaper than "hot" S3 storage. But, you pay a penalty if you download/delete to early/too often.
As a third, cheap-ish option, go get a 2- or 4-bay NAS that can run TrueNAS, and put a pair of 8-12TB HDDs in it. Configure the disks in a ZFS mirrored pair. Run a cron job once per day/week to pull down the contents of the Hetzner box. (Your cron will want to, again, use rsync with hardlinks.) You now have a local machine mirroring your hot backups. It is arguably more expensive than some other options (~600USD up front), but you don't have any "we might run out of space" issues. And, because you're using it to pull, you don't have any weird networking problems: just SCP the data down. (Or, rsync it down over SSH.)

Whatever you are doing, consider targeting two different destinations at two different times (per day/alternating/etc.). Or, consider having some combination of backups that give you multiple copies at multiple sites. That could be Hetzner in two regions, with backups run on alternating days, or it could be you backup to a storage box and pull down a clone every day to a local NAS, or ... or ...

Ultimately, your 150GB is small. If you're increasing by a few GB per week, you're saying that you are likely to have 1TB/year. Not knowing your company's finances, this is generally considered a small amount of data. Trying to optimize for cost, immediately, is possibly less important than just getting the backups somewhere.

Other strategies could involve backing up to the NAS locally first, and then using a cron to borg or rsync to a remote host (possibly more annoying to set up), etc. But, you might have more "dedupe" options then. (borg has dedupe built in, I think, but...)

I have a suspicion that your desire to use object storage might be a red herring. But, again, I don't know your constraints/budget/needs/concerns.

Deduplication: If you use rsync with hardlinks, then each daily backup will automatically dedupe unchanged files. A hardlink is a pointer to a file. So, if you upload super_ai_outputs_day_1.md to your storage on Monday, and it remains unchanged for the rest of time, then each subsequent day is going to be a hardlink to that file. It will, for all intents and purposes, take up zero disk space. So, if you are backing up large numbers of small-to-medium sized files that do not change, SSHFS/rsync with hardlinks is going to naturally dedupe your unchanging old data.

This will not do binary deduplication of different files. So, if you're looking for a backup solution that would (say) identify that two, 1GB files where the middle 500GB are identical, and somehow dedupe that... you need more sophisticated tools and strategies. Rsync/hardlinks just makes sure that the same file, backed up every day, does not take (# days * size) space. It just takes the original size of the file plus an inode in the FS for each link.

Note, though, if you involve a snapshot of your hardlinked backups to an object store, every file may take the full size of every file for every day. I'm possibly wrong on that, but I'm not confident that most tools would know what to do with those hardlinks when you're copying to an object store. I think you'd end up multiplying your disk usage significantly, because your backup tool will have to create a copy of each file into the object store. (Most object stores do not have a notion of symlinks/hardlinks.) An experiment with a subset of the data, or even a few files, will tell you the answer to that question.

If you have other questions, you can ask here, or DM me.

ruihildt

Minio is back:

https://blog.vonng.com/en/db/minio-resurrect/

I see on the HN conversation people are doubting the author, and some others also mention Chainguard will also keep a fork with CVE patched: https://github.com/chainguard-forks/minio

Maybe it's worth waiting a bit to see which fork get consistent maintenance.

d19dotca

I am in a similar position. I currently use iDrive e2 for backups and it’s fine but it does take around 1.5 hours uploading tarballs from my server. I’m looking at possibly deploying a low-budget Kimsufi server in the same OVH data centre and just mounting that disk as SSHFS to Cloudron on my primary server, haven’t tried it out yet. If I go this way I will likely still keep iDrive as a second backup destination and just run it a little less frequently and with lower retention to save on costs a little bit there.

I’m wondering about MinIO alternatives as I tried MinIO on a second Cloudron install but it seemed to take even longer than uploading to iDrive e2 somehow (I expected it’d be quicker not slower). It seemed the project is dead too but then it also looks like there’s an active fork that maybe the Cloudron @staff can look into using instead. Brings back many of the lost MinIO features by the sounds of it too.

Thinking of other avenues to keep backups more “local” or as close to local as possible for rapid quick backups, and then completely offsite as a second backup plan too.

I have around 65 GB compressed to back up, around 125 GB uncompressed, I believe.

robi

Have you tried the custom Garage S3 app?

girish

The garage app is packaged, just reviewing it and have to a get an initial package out.

d19dotca

I haven’t used Garage yet but isn’t it just another s3? So it’d basically be a MinIO replacement, right? Do we have any other options for ones with “hardlinks” using rsync? I kind of think the Surfer app would honestly be a great way to use as a backup somehow if it could be used to expose a disk.

girish

Whoops, apologies. The garage app is not packaged. We have packaged seaweedfs and that is what is in the pipeline. The S3 compat layer seems to work well in that app. https://git.cloudron.io/packages/seaweedfs-app/ is the package.

robi

The minimally changed and full featured drop-in fork of MinIO is called Silo - https://github.com/pgsty/minio

Would be a great community package if not official app replacement.

I'd still like to see Garage packaged too, it's quick and simple, even does http hosting.

joseph

RustFS might also work as a MinIO replacement

jdaviescoates

@joseph said:

RustFS might also work as a MinIO replacement

...once it's out of Alpha.

Personally I wouldn't want backups anywhere close to anything in an alpha

crazybrad

You might want to look at Borg. I have no experience but read that it has deduplication and incremental backup capability.

necrevistonnezr

restic is also a strong candidate that is my daily driver: https://restic.net/
Super reliable.
And there’s even a community guide, (by me ) : https://docs.cloudron.io/guides/community/restic-rclone

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Struggling to Replace MinIO - Advice Welcome!