Garage, an open-source distributed storage service you can self-host to fullfill many needs
-
https://garagehq.deuxfleurs.fr/
Alternative to Minio, specially made for selfhosting setup.
See: https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/
Code: https://git.deuxfleurs.fr/Deuxfleurs/garage@ruihildt this sounds absolutely brilliant, thanks for sharing!
-
Too bad this is only replicated storage. (from what I can tell with a look of the front page)
Minio can use error correcting codes instead of replication.
-
Too bad this is only replicated storage. (from what I can tell with a look of the front page)
Minio can use error correcting codes instead of replication.
@robi said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
Too bad this is only replicated storage. (from what I can tell with a look of the front page)
Minio can use error correcting codes instead of replication.
@infogulch said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
@robi Yeah unfortunately.
Non-goals include:
- ...
- Erasure coding (our replication model is simply to copy the data as is on several nodes, in different datacenters if possible)
What's the difference? or advantage of error correcting / erasure codiing?
-
@robi said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
Too bad this is only replicated storage. (from what I can tell with a look of the front page)
Minio can use error correcting codes instead of replication.
@infogulch said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
@robi Yeah unfortunately.
Non-goals include:
- ...
- Erasure coding (our replication model is simply to copy the data as is on several nodes, in different datacenters if possible)
What's the difference? or advantage of error correcting / erasure codiing?
@jdaviescoates You can read about it in many places online comparing the two architectures and implementations.
EC is more space efficient, especially at large scale (Petabyte, Exabyte, etc), while replication always uses 100% more per replica and doesn't guarantee bit flip protection, etc. hence it doesn't scale well.
EC can always compute any missing pieces and self-heal or regenerate.
That's why you can hear all the words speaking with someone on the phone from across the planet, as many lost packets are recomputed to fill in gaps instead of long retransmits.
This applies to huge storage on continuously failing hardware under it too.
-
@jdaviescoates You can read about it in many places online comparing the two architectures and implementations.
EC is more space efficient, especially at large scale (Petabyte, Exabyte, etc), while replication always uses 100% more per replica and doesn't guarantee bit flip protection, etc. hence it doesn't scale well.
EC can always compute any missing pieces and self-heal or regenerate.
That's why you can hear all the words speaking with someone on the phone from across the planet, as many lost packets are recomputed to fill in gaps instead of long retransmits.
This applies to huge storage on continuously failing hardware under it too.
@robi said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
@jdaviescoates You can read about it in many places online comparing the two architectures and implementations.
EC is more space efficient, especially at large scale (Petabyte, Exabyte, etc), while replication always uses 100% more per replica and doesn't guarantee bit flip protection, etc. hence it doesn't scale well.
EC can always compute any missing pieces and self-heal or regenerate.
That's why you can hear all the words speaking with someone on the phone from across the planet, as many lost packets are recomputed to fill in gaps instead of long retransmits.
This applies to huge storage on continuously failing hardware under it too.
You have a gift for explaining things.
-
https://garagehq.deuxfleurs.fr/
Alternative to Minio, specially made for selfhosting setup.
See: https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/
Code: https://git.deuxfleurs.fr/Deuxfleurs/garage@ruihildt said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
https://garagehq.deuxfleurs.fr/
Alternative to Minio, specially made for selfhosting setup.
See: https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/
Code: https://git.deuxfleurs.fr/Deuxfleurs/garageI haven't checked out this suggestion yet, but if it like what I have in mind, it could be brilliant for Cloudron.
Cloudron is all about self-hosting. and there are a huge number of games with outdated gaming systems that could be repurposed to Cloudron self-hosting. These machines would have large amounts of RAM (for games), a very fast SSD (for running the system) and also a large storage drive (to hold the games).
That sort of legacy hardware is, I think, absolutely ripe for Cloudron! The Garage software could make those old storage disks have a new life!
-
@ruihildt said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
https://garagehq.deuxfleurs.fr/
Alternative to Minio, specially made for selfhosting setup.
See: https://garagehq.deuxfleurs.fr/blog/2022-introducing-garage/
Code: https://git.deuxfleurs.fr/Deuxfleurs/garageI haven't checked out this suggestion yet, but if it like what I have in mind, it could be brilliant for Cloudron.
Cloudron is all about self-hosting. and there are a huge number of games with outdated gaming systems that could be repurposed to Cloudron self-hosting. These machines would have large amounts of RAM (for games), a very fast SSD (for running the system) and also a large storage drive (to hold the games).
That sort of legacy hardware is, I think, absolutely ripe for Cloudron! The Garage software could make those old storage disks have a new life!
Garage still doesn't support 100% the s3 protocol.
And due to how is written, modern NVMe will not get a BIG boost if you don't have a powerful CPU (it CPU intensive for big files like cloudron Backup, and is still not well-performing if you have small files like in a Nextcloud with small docs.)MooCloud is testing Garage, and we have an agreement with Garage leading developer to publish the data collected using garage in a modern cloud environment.
-
Garage still doesn't support 100% the s3 protocol.
And due to how is written, modern NVMe will not get a BIG boost if you don't have a powerful CPU (it CPU intensive for big files like cloudron Backup, and is still not well-performing if you have small files like in a Nextcloud with small docs.)MooCloud is testing Garage, and we have an agreement with Garage leading developer to publish the data collected using garage in a modern cloud environment.
@MooCloud_Matt Have you looked at SeaweedFS yet?
-
@MooCloud_Matt Have you looked at SeaweedFS yet?
@robi
We did, but i don't recall the reason we excluded as an alternate to ceph(that we are using in prod currently) -
J jdaviescoates referenced this topic on
-
Garage still doesn't support 100% the s3 protocol.
And due to how is written, modern NVMe will not get a BIG boost if you don't have a powerful CPU (it CPU intensive for big files like cloudron Backup, and is still not well-performing if you have small files like in a Nextcloud with small docs.)MooCloud is testing Garage, and we have an agreement with Garage leading developer to publish the data collected using garage in a modern cloud environment.
@MooCloud_Matt said in Garage, an open-source distributed storage service you can self-host to fullfill many needs:
Garage still doesn't support 100% the s3 protocol.
Seems they got most of it covered now:
https://garagehq.deuxfleurs.fr/documentation/reference-manual/s3-compatibility/ -
I started work on a package; this was under a different thread, but it is probably more appropriate to mention here:
-
update
I've poked the package with a stick, added more to the README, and have run into a few interesting things about this package.
s3 functionality
I was able to create a bucket and put stuff in it. This seems to be a core function of an S3-compatible object server.
administrative server
Garage has a notion of having an administrative API at one port. I can use
httpPortsto bind this port, and I can use it. For example, if I haveas the root domain, then
can be the home for the Admin API. And, I was able to create an administrative API token, and using it, access the admin API.
static websites
This one is tricky. In theory, if I configure part of the
garage.tomlcorrectly:[s3_web] bind_addr = "[::]:3902" # This wants to be set dynamically in the startup. # That way, it can grab a Cloudron variable. root_domain = ".web.s3.example.com" index = "index.html"I can serve static sites out of buckets. However, this implies domain name manipulation. And, possibly, a wildcard. As in, I'd like
*.web.example.comto resolve tos3.example.com, and for Garage to pick it up (internally) on port3902.I have explored this manually (by manipulating DNS settings in Cloudflare), but even though I have configured a bucket to serve static content, I can't (yet) convince it to serve something up.
While it might be that this functionality has to be sacrificed, I think it would be a nice way (if it was baked in) to manage 100% static sites. However, that would be new machinery: a way to map domains to buckets.
backups
I'm not convinced the backups are good. Specifically
<path>/meta/db.sqliteis the metadata database for the Garage instance. This is, as far as I can tell, all of the information about where all of the files is stored. Losing this database is tantamount to losing all of the data. I think. So, making sure it backs up correctly matters. And, it is clear that updates will need to do things like
garage repairandgarage migrate, in the event of migrations/changes to this metadata database.Ah:
Since Garage v0.9.4, you can use the
garage meta snapshot --allcommand to take a simultaneous snapshot of the metadata database files of all your nodes. This avoids the tedious process of having to take them down one by one before upgrading. Be careful that if automatic snapshotting is enabled, Garage only keeps the last two snapshots and deletes older ones, so you might want to disable automatic snapshotting in your upgraded configuration file until you have confirmed that the upgrade ran successfully. In addition to snapshotting the metadata databases of your nodes, you should back-up at least the cluster_layout file of one of your Garage instances (this file should be the same on all nodes and you can copy it safely while Garage is running).(Emphasis mine.)
So, the backup process is something I'll need to investigate further. It might be that some manual/scripted management of this database file---and dumping it---is going to be a bit of a thing in terms of having it be a robust process.
(Given that Cloudron does backups before upgrades, as long as the SQLite DB is snapshotted correctly on backup, I think it will be fine.) I suspect that a cron will need to be installed for this package that---daily?---runs the snapshot command, rotates DBs, and those are part of the backup. (I have a suspicion that Cloudron packages handle this kind of thing in the
start.shscripts?)healthcheck URL
The manifest assumes that the health check URL is on the main app. In this case, if I have
s3.example.comand the Admin API is at
admin.s3.example.com(defined in
httpPorts), I want the health check URL to behttps://admin.s3.example.com/healthbecause that is where Garage put it. I don't think I can do that with the manifest as-designed.
summary
I think the package is off to a good start. I have questions, but most of them are described above, and I'll probably figure things out. The health check and static site subdomains, though, might not be easily solved.