Disk space should never bring a whole server down

scooke

@marcusquinn Hang in there @marcusquinn. Bonne courage.

humptydumpty

WHM has disk space limitations. Is it possible to copy their method and have it implemented in CR?

marcusquinn

Thanks for all the help - I managed to get some extra hands on deck this morning and we're moving lots of data to a mounted volume for much more headroom.

I still think it's a little too vulnerable having this hazard able to bring a server down.

Also, I couldn't see if there's a way to set Email storage to be a mounted volume too?

mehdi

@girish Also, the current warning is IMO not very useful if the threshold is not configurable. Depending on how the server is used, a few GB may be enough for weeks, or for mere hours if there's media stuff on the server, or if a user uploads stuff on nextcloud or something.

d19dotca

@marcusquinn said in Disk space should never bring a whole server down:

Also, I couldn't see if there's a way to set Email storage to be a mounted volume too?

Currently, emails are part of boxdata and you need to move the boxdata entirely. I’ve done this in my current server due to the amount of email stored for my clients. The steps for this are at https://docs.cloudron.io/storage/#default-data-directory for reference.

I’m making an assumption by volume you meant an external disk vs the actual Volumes function that Cloudron has.

There is a feature request I believe to keep emails separate but boxdata really don’t contain much data at all other than emails so it’s doable as-is for now. It’d just be nice to see the GUI handle moving the email data much like it does for apps.

marcusquinn

@d19dotca Thanks. I'm an app specialist and anything more than a few minutes digging in the dirt is my kinda hell. Just getting brain fog now as I've lost a bunch of important work and 2 days of progress on it now

marcusquinn

Anyone know where /app/data actually is in the full file system structure?

I'm trying to navigate a snapshot clone to see if that has the missing config.php file that hasn't come back for EspoCRM but just not seeing anything obvious and searching docs hasn't found me the clue.

marcusquinn

The problem I have is that EspoCRM Administration writes changes back to /app/data/data/config.php - however, that file also contains all the database connection details, password hash, basically everything for that instance to work.

So when the disk was full, it seems to have somehow written a 0kb version of config.php.

And because of the rsync encryption failing to backup EspoCRM, the Cloudron backups aren't complete.

So that leaves provider backup snapshot restore and dig around.

Basically, whatever anyone does - never allow the disk to get full - the cascade of problems that can happen from that interruption is just one massive time hole.

marcusquinn

@marcusquinn Holy sh*t, with some dumb-luck trying everything I know, I seem to have fixed it.

Lesson learnt - never run out of disk space - sods law says it will be the apps you rely on the most that will get corrupted.

Now, given the many open ways to load up a Cloudron with data (email/FilePizza/PrivateBin) maybe there's a way to avoid this causing a total fail?

jdaviescoates

@marcusquinn said in Disk space should never bring a whole server down:

Now, given the many open ways to load up a Cloudron with data (email/FilePizza/PrivateBin) maybe there's a way to avoid this causing a total fail?

I think FilePizza if fully P2P and so I'm not sure you could fill the server up with that (but you could with Jirafeau).

But yeah, I reckon configurable disk space notifications (e.g. email/notify me hourly/daily/whatever once I've only got x space left) but be a good first step to help this not to happen.

marcusquinn

Quick fix idea: maybe 70% full is a better nag threshold?

bestknownhost

Thanks for all the feedback here. We discovered cloudron a whiles back and have been testing it out on a number of server over the last couple of months. We wanted to get a good handle on how everything works before rolling anything out into production. Firstly it’s a excellent platform and fills a great need. But we did run into a little problem with one of our test servers running on a digital ocean droplet. About 2 weeks ago it went from using 20gb of space to nearly 80gb in the space of 4 hours. We received an alart from digital ocean however things were happening so fast that all we could initially do is upgrade the instance, this gave us half and hour and then we had to do it again, then we just attached a 100gb volume. Although just in testing there was a wordpress app we were fond of and so we transferred it off the cloudron and left a pixelfed app. Somewhere between shutting down the server to add the volume and moving the Wordpress app, the space usage stopped increasing. I know what your thinking Wordpress right? No we checked the install before hand and it was working fine on another server. We then removed the 100gb volume and resized the digital ocean server back to its original size and evething was back to normal. I figured that some server updates ran that morning and some out of control process started this and resizing the server up and down somehow got rid of the problem.

robi

@bestknownhost Did you perhaps have AdGuard installed?

bestknownhost

@robi No we didn't.

robi

@bestknownhost did you figure out what was filling up the disk with du -sh /* and drilling down?

nebulon

@bestknownhost for a start to clarify, are you using an external backup storage or just the local disk for now? Using the local disk may cause disk usage to go up quickly depending on how much data you've put into the server.

If that is not the case, then you may have hit some issue we recently saw with mysql binlogs https://forum.cloudron.io/topic/4510/able-to-clean-up-binlog-files-in-var-lib-mysql-directory?_=1616402616926 ?

And as @robi mentioned, do you have any idea so far what is using all that disk space?

DigitEgal

@marcusquinn I was running into a simular issue while testing some stuff, most likely because of the Nexcloud Plugin "External Sites":
I am not sure right now, but i dont think that it recreates the files, but more likely it writes a looooooooot of logs down since cpu got pushed aswell

( THATS NOT A TUTORIAL! ITS ONLY FOR REPEATABILITY OF BUGS! )
How to create the Issue repeatable:
1: Create a Nextcloud and share a folder(structure) to a public link.
2. Insert this link into any secondary website (wordpress etc) as a button that does NOT open a new tab.
3. Add the Plugin "External Sites" to Nextcloud - go to config and add the secondary website.
4. By using the embed Mode of external site implementation this issue is possible to get triggered by a user with access to the External Sites Buttons.
4.1 *Actual i did not test it by using a non-admin user as "trigger" user

How to finally trigger the filling of Disk space ?
-> Now follow the link in Nextcloud to your secondary website.
-> By clicking the button back into nextcloud the issues is triggered.

( THATS NOT A TUTORIAL! ITS ONLY FOR REPEATABILITY OF BUGS! )

makemrproper

Here is my SOLUTION:

It does not solve the root cause why you are running out of space, but with this methodology you will buy yourself time.

Generate 3 files of 2 gigabytes each.
This is one way of generating these files:
fallocate -l 2G /storage-padding-buffer-2-gb-file1.img
fallocate -l 2G /storage-padding-buffer-2-gb-file2.img
fallocate -l 2G /storage-padding-buffer-2-gb-file3.img

When your server is out of storage, you may delete one or all of these padding files, so that regain the space you need to rescue the server.

I have had the same issue with cloudron, because over time, storage will run out.
For now I chose not to update the storage of my VPS server because it will double my hosting cost for this node, from USD400 to USD800 per year. That's digital ocean pricing for you, but I digress.

This is a systems engineering issue and isn't caused by Cloudron. However I would not have anything against an elegant solution from the team if it were possible :).

I want to say I am working on a post to describe I work with a massive cleanup, and exactly which steps I took to regain loads of space. TLDR; use ncdu, analyze all containers and identify where apps are storing logs and rotating these, clear NPM package cache in each container. More to cone

marcusquinn

Maybe the Cloudron app needs to generate its own partition to run from, where regular app storage can't saturate the OS or Cloudron partitions?

girish

@marcusquinn said in Disk space should never bring a whole server down:

Maybe the Cloudron app needs to generate its own partition to run from, where regular app storage can't saturate the OS or Cloudron partitions?

Right. The main issue, it's not possible to create proper disk partitions in VPS i.e one can only create file backed loop back file systems but such things are not to be used in production and I have no idea about their reliability/durability.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Disk space should never bring a whole server down