Disk space should never bring a whole server down
-
@girish said in Disk space should never bring a whole server down:
Managed to bring it up by truncating many logs
Is this perhaps related to the issue I reported a little while back too, regarding the logrotate not running properly under certain circumstances?
-
Going to trigger a move on Confluence to the mounted volume, it's 4.5GB with 7.5GB free space now on the main volume - so hopefully that's enough working space but I have to zzz, problems where I know I don't immediately know how to solve are kinda exhausting.
-
@marcusquinn looks like things are back up! There is ~7GB left, so hopefully that should hold up for sometime.
-
I am looking into some clues on what can be done to mitigate this, will report back. BTW, for the volume suggestion, this is possible. In fact, we used to do this very long ago with each app having it's own btrfs partition. Usually, people start with a simple VPS. This means that for this to work out of the box one has to create a loopback file system which is very slow. Also, when I logged in to your server, it was mysql that was down which was not happy with lack of disk.
I am wondering if the solution involves suggesting the user to make a specific kind of setup if they want to protect themselves against this kind of issue. That is totally doable (for example, suggest user to move platformdata and boxdata to a separate volume/disk post installation)
-
@robi We actually have a disk space alert, in fact, it's there right now in the dashboard.
But the above is not super useful because it's just checking space in a cronjob. This cronjob is quite conservative because we don't want to keep spinning the disk too much. I am not aware of a way to get a "signal" from the server when disk space limits are hit. If a server fills up too fast between cron runs, the whole thing is useless...
-
I've triggered some bigger app data moves to the mounted 1TB volume but it seems to have chewed through 3GB of the remaining free space on the main volume already and I'm back to "Cloudron is offline. Reconnecting". Probably just making hasty tiredness errors now.
-
@marcusquinn maybe it's best to move them by hand first. Can you send me the apps you want to move by email and I can move it by hand since this seems to keep hitting a wall. ie. free space -> try to free space -> run out of space and start over...
-
@girish yes, but does it email you when approaching the threshold?
threshold setting? (twice a day should be plenty)
action setting checkboxes? (maybe a custom one too?)
heck, even deleting an non critical app would be fine since it's restorable from backup.
-
@marcusquinn Hang in there @marcusquinn. Bonne courage.
-
WHM has disk space limitations. Is it possible to copy their method and have it implemented in CR?
-
Thanks for all the help - I managed to get some extra hands on deck this morning and we're moving lots of data to a mounted volume for much more headroom.
I still think it's a little too vulnerable having this hazard able to bring a server down.
Also, I couldn't see if there's a way to set Email storage to be a mounted volume too?
-
@girish Also, the current warning is IMO not very useful if the threshold is not configurable. Depending on how the server is used, a few GB may be enough for weeks, or for mere hours if there's media stuff on the server, or if a user uploads stuff on nextcloud or something.
-
@marcusquinn said in Disk space should never bring a whole server down:
Also, I couldn't see if there's a way to set Email storage to be a mounted volume too?
Currently, emails are part of boxdata and you need to move the boxdata entirely. I’ve done this in my current server due to the amount of email stored for my clients. The steps for this are at https://docs.cloudron.io/storage/#default-data-directory for reference.
I’m making an assumption by volume you meant an external disk vs the actual Volumes function that Cloudron has.
There is a feature request I believe to keep emails separate but boxdata really don’t contain much data at all other than emails so it’s doable as-is for now. It’d just be nice to see the GUI handle moving the email data much like it does for apps.
-
Anyone know where /app/data actually is in the full file system structure?
I'm trying to navigate a snapshot clone to see if that has the missing config.php file that hasn't come back for EspoCRM but just not seeing anything obvious and searching docs hasn't found me the clue.
-
The problem I have is that EspoCRM Administration writes changes back to
/app/data/data/config.php
- however, that file also contains all the database connection details, password hash, basically everything for that instance to work.So when the disk was full, it seems to have somehow written a 0kb version of config.php.
And because of the rsync encryption failing to backup EspoCRM, the Cloudron backups aren't complete.
So that leaves provider backup snapshot restore and dig around.
Basically, whatever anyone does - never allow the disk to get full - the cascade of problems that can happen from that interruption is just one massive time hole.
-
@marcusquinn Holy sh*t, with some dumb-luck trying everything I know, I seem to have fixed it.
Lesson learnt - never run out of disk space - sods law says it will be the apps you rely on the most that will get corrupted.
Now, given the many open ways to load up a Cloudron with data (email/FilePizza/PrivateBin) maybe there's a way to avoid this causing a total fail?
-
@marcusquinn said in Disk space should never bring a whole server down:
Now, given the many open ways to load up a Cloudron with data (email/FilePizza/PrivateBin) maybe there's a way to avoid this causing a total fail?
I think FilePizza if fully P2P and so I'm not sure you could fill the server up with that (but you could with Jirafeau).
But yeah, I reckon configurable disk space notifications (e.g. email/notify me hourly/daily/whatever once I've only got x space left) but be a good first step to help this not to happen.
-
Quick fix idea: maybe 70% full is a better nag threshold?