Idea: Reserve data disk to be cleared upon emergency "Disk full"
-
AmbroiseUnlywrote on Jun 29, 2024, 6:24 AM last edited by AmbroiseUnly Jun 29, 2024, 6:28 AM
I've met quite a lot of "disk full" issues recently, and I'm not alone (thankfully).
I wonder about the benefits for Cloudron to optionally reserve part of the disk with "fake data", essentially locking away a bit of the space available in the main hard drive.
That might sound dumb, but it has benefits.The main goal of such a feature would be to avoid the situation where:
- There is no disk space left
- We can't get back any, as we don't know what data we could delete
- Or, we know what data could be deleted, but to do so we need to be able to write a tiny bit first (like what happened there)
- Due to the disk being full, we have very little flexibility about what we can do, and simple solutions are out of our reach
From experience, it's very hard to recover from that issue when the disk is fully saturated, unless you know exactly what data can be removed safely and how to do it, or can rescale your server on the fly. You usually learn about them while trying to fix the whole thing.
By having Cloudron users have their disk being full a bit faster, it helps discover the "Disk full" issue a bit faster, too.
But, when the disk is really full, and we don't know what to delete, we are really stuck. We can't even do a clean stop of active apps and run a Cloudron backup of the system, because there is no more disk space to do so.On the other hand, by locking away a bit of the disk (enough to allow performing ops, and not too much not to be a hindrance), and releasing it upon request, it gives Cloudron the ability to:
- Crash as usual when the disk is full
- Admin becomes aware of the "disk full" issue (usually not immediately, but when online services stop working)
- Admin can use "Release unused disk space" (the new feature I'm talking about)
- We might also provide a few strategy tips there, such as stopping the apps, release and then do a backup
- Cloudron releases some space by removing its "reserve"
- Disk is no longer entirely full and Admin can now run some operations, such as:
- Run a Cloudron Backup
- Restart the apps (if they were stopped or errored due to no disk space left)
This solution could provide both awareness of the root issue, and enough time/space to find a proper solution while continuing operations.
Typically, for a server of 40GB, I would imagine a "reserve" of 1-5GB, configurable by Admin, 1Gb could be the default. It's essentially a way of asking our future self about "how much time/space do you want to have to find a proper solution, once you've figured out you didn't anticipate your disk being saturated and have to rely on the reserve, before you run out of space again".
In a way, it's a bit like how cars work. They don't just suddenly stop when you run out of oil, they don't stop at all, but they vehemently signal you that you're on the reserve and need a refill real quick. Cloudron doesn't have a native monitoring system about the hard drive, but could give us an alternative that isn't as complex to implement.
-
timconsidine App Devwrote on Jun 29, 2024, 6:47 AM last edited by timconsidine Jun 29, 2024, 6:49 AM
Hmmm
I think it is better to put resources into stopping the problem rather than "closing the stable door after the horse has already bolted".As I've posted before, it's trivial to set up a bash script using
ntfy
which runs on a cron job.
I have 2 scripts :- one runs daily and sends a
ntfy
alert of disk free info (and also docker containers exited) - one runs hourly and sends a
ntfy
alert when disk used exceeds a set %
Some dashboards (if you use them) e.g.
homepage
report disk usage stats, so it's in front of you throughout the day as you use your dashboard.While I'm sympathetic to the problem (it's a bummer when it happens), I have not had a "disk full" in over 10 years. It's pretty easy to stay on top of the situation, and much easier to be aware in advance than to deal with it afterwards.
- one runs daily and sends a
-
AmbroiseUnlywrote on Jun 29, 2024, 6:53 AM last edited by AmbroiseUnly Jun 29, 2024, 6:55 AM
As most things, it's easy when you know exactly how to do it.
But being a developer, not a sysadmin, not knowing the tools, and noticing how little Cloudron guides us to a proper, production-grade setup, this is a real issue for me.
The discoverability and guiding about this topic is clearly lacking. IMHO, Cloudron should come up with a basic monitoring capabilities (opt-in), for those who don't want/know how to set up a custom monitoring system by themselves.
And that could be as simple as what you mention, a few scripts that send notifications through email/webhooks upon some events.
-
Understood.
As a developer, I'm sure a couple of bash scripts are within your competence, and I (and others) have posted suggestions before. I can post them again if you would like, or do a search for 'disk full' or 'ntfy'.Ntfy
is available as an app here in the Cloudron App Store, andcron
is available in any installed app, so it could be argued that Cloudron do provide tools already.I'm not aware of any other VPS provider or self-host environment which proactively do this kind of alerting, Not to say that it would not be a nice added feature, just saying that actually it is already possible on Cloudron with minimal work.
If you don't want to self-host
ntfy
because it would use an app in a free 2-app Cloudron deployment, I think there is still a freentfy
hosted service (would need to be checked). -
@timconsidine Thanks for the tip, but I'm not confident in my own ability to maintain this kind of manual thing across a range of servers.
I spent my Saturday experimenting with various monitoring solutions, and found Netdata to be exactly what I needed.
I wrote a guide about it, and I hope you'll find it useful!
4/5