I appreciate your input, @robi . I'm hoping to hear from other packagers and the product team. Your questions/comments help illuminate the kind of things I think need to be considered in this package, so I'll use this as an opportunity to capture them more fully, and I've woven some packaging questions in along the way for the Cloudron team/other packagers.
Since you have Caddy, it can also proxy to the admin port via /admin instead, no?
Same for /api to the API port & KV store.
I don't think this is a good idea. The Administrative API may (now, or in the future) have an /admin path that maps to some behavior. To override/rewrite pathways on an API is to fundamentally change the present/future behavior of the API. Given that Cloudron is an infrastructure platform, this means that I would be packaging a "broken" version of Garage (because, by rewriting pathways, I'd be changing the behavior of Garage). It might not be broken now, but I would be setting up a future where it becomes broken. So, I think rewriting pathways on any of the ports (and, in doing so, altering the behavior of Garage) is a bad idea.
Adding the K/V store is trivial. I just haven't done it.
I stand my by assertion that I believe the CloudronManifest needs to be expanded.
- I cannot use
httpPorts to insert a wildcard into the DNS, because it claims that *.web.domain.tld is not a proper domain, and
- I cannot insert aliases via manifest (but I can via
cloudron install), which do allow me to insert A records into DNS
So, either httpPorts needs to allow me to insert wildcard domains, or I need an aliases entry in the manifest (preferably mapped to an array). There may be another way, but I think that altering application behavior---especially for an S3-compatible API and administrative server---is the wrong approach. I also think including instructions that tell users to add aliases is a bad approach, but... at least there is precedent (e.g. packages that have default user/pass combos like admin/changeme123).
SQLite is pretty robust running on Billions of devices rather invisibly, so I wouldn't worry too much. Like you said a snapshot of it could be useful, but the use case may yet need to be discovered.
Cloudron backup takes care of everything in /app/data so there will be a nested redundancy if used for backup.
Perhaps this is a bit direct, but I'm going to state it this way: you are a bit casual with my data. And, for that matter, everyone's data who relies on Cloudron or (say) the package I am working on. My premise is that if someone is using this package, they should be able to trust that their data will not be corrupted or lost by the normal day-to-day operation of Cloudron, including a package update or upgrade. That happens through careful thought and engineering.
Two links that I think are educational in this regard:
Because SQLite writes temporary files while in operation, you cannot simply copy it. I assume that @girish and @nebulon took these kinds of things into account when they introduced the sqlite addon under localstorage. But, I don't know. I'm confident they, or someone, will help answer my uncertainty.
Therefore, my question is: if I have indicated that the metadata database that Garage relies on is specified in localstorage, can I be confident that the standard Cloudron backup mechanism will properly backup that SQLite file, even if it is in use? For the Garage package, I have included it in my localstorage property, but I don't know what happens when I do that, because the packaging/manifest docs are not very specific about what including it means.
I also know I need to handle database migrations. When Garage goes from v2.1.0 to v2.2.0 (or, more likely, v2 to v3), it might make changes to how it structures its metadata.
These things are my responsibility as a packager. Because Cloudron does not appear to provide any "upgrade" hooks, I believe/think I need to write my startup scripts to always assume that the current boot could be one where I am upgrading---either minor or major. And, because that SQLite file, if damaged/lost, will mean losing all of the data stored by Garage, it is important to get the start/restart sequence right.
Coupled with my uncertainty (currently) about how SQLite is handled by Cloudron, I don't even know if I can roll back to a Cloudron snapshot safely.
Cloudron says this:
Sqlite files that are actively in use cannot be backed up using a simple cp. Cloudron will take a consistent portable backups of Sqlite files specified in this option.
So: I am assuming that Cloudron does the right thing w/ SQLite files w.r.t. backups.
Garage suggests that for minor upgrades, a garage repair before upgrading is enough, and for major upgrades, more is necessary. However, the package doesn't "know" when it is minor vs. major. (I do, but the package doesn't... unless I build that logic into the package startup.) So, I suspect I need to pretend, on every boot, that it might be a major upgrade. (For packagers: is this the right strategy/assumption?)
This is what Garage recommends for major upgrades:
- Disable API access (for instance in your reverse proxy, or by commenting the corresponding section in your Garage configuration file and restarting Garage)
- Check that your cluster is idle
- Make sure the health of your cluster is good (see garage repair)
- Stop the whole cluster
- Back up the metadata folder of all your nodes, so that you will be able to restore it if the upgrade fails (data blocks being immutable, they should not be impacted)
- Install the new binary, update the configuration
- Start the whole cluster
- If needed, run the corresponding migration from garage migrate
- Make sure the health of your cluster is good
- Enable API access (reverse step 1)
- Monitor your cluster while load comes back, check that all your applications are happy with this new version
Now, some of this comes "for free" when a package is being upgraded, because I will be doing this at startup (in my start.bash), which is before the garage service is running. Therefore, I can take as "given" the things that involve being in a shut-down state.
Disable API access (for instance in your reverse proxy, or by commenting the corresponding section in your Garage configuration file and restarting Garage)
Check that your cluster is idle
- Make sure the health of your cluster is good (see garage repair)
Stop the whole cluster
- Back up the metadata folder of all your nodes, so that you will be able to restore it if the upgrade fails (data blocks being immutable, they should not be impacted) (note: This probably means
garage meta snapshot --all)
- Install the new binary, update the configuration
Start the whole cluster
- If needed, run the corresponding migration from garage migrate
Make sure the health of your cluster is good
Enable API access (reverse step 1)
Monitor your cluster while load comes back, check that all your applications are happy with this new version
Every time the container starts, I think I need to do a garage repair. I think I also need to follow the steps above (e.g. snapshot, migrate, etc.). This way, if I rebuild the container, and go from v2.1.0 to v3.0.0, I am guaranteed that I have 1. repaired the database recently, 2. taken a snapshot that is robust/safe/not in active operation, and 3. migrated to the most recent table schema. This should ensure that, when I start garage serve, that the application is working against the right table schemas.
What concerns me is the configuration. I am likely going to need to pay attention to updates/upgrades, and determine if (say) a v2.1.0 configuration would break a v3.0.0 deployment. If so, then I'm going to (possibly) package both the v2.1.0 and v3.0.0 binary into the image, and "fall back" to the v2 binary when a v2 configuration on a user's installation is detected. That way, I can prompt users as part of the upgrade to make sure to go in, update their configs, and reboot. (Or, something.)
This not yet a problem, but it is directly tied to the SQLite database, which is the Garage filesystem. Loss or corruption of that DB is loss of data.
Does Garage have any S3 Gateway features?
I don't think so. If you need one, you can package versitygw, which is dedicated to that purpose.
A question for packagers: how do we test packages? Does anyone have strategies for automating testing of package development/updates/upgrades?