Solved Disk is (suddenly) full on 1TB drive, can't access cloudron
Hello. A few days ago my disk usage was around ~20% when I checked. Today I got an alert that my disk was full, yet the disk usage when I ran some
ducommands on my server does not reflect that. I can't access cloudron as the disk is completely full. It appears to have filled up rapidly within a few hours, with no changes on my end.
I've had problems with my separate local backup server recently and thought maybe Cloudron had fallen back to storing backups on the local disk, but I checked
var/backupsand there is nothing there more than a few kb. Unless there is another place the backups might be stored?
I got a glimpse of the disk usage statistics on Cloudron dashboard before I stopped being able to access it; it said that like 90% of the usage was for the green "System" type. Not sure if that is helpful for diagnosing what's going on.
Seeking any tips for debugging this, thanks! (really hoping it's not a data corruption issue..)
marcusquinn last edited by
Cross-post of a thread we had on this same subject a while back:
This was my situation after long long search:
- since Cloudron doesn't notify admin by mail if there is something wrong like 'backup not succeeded' or 'CIFS connection lost' after a few days I noticed backup failure because of CIFS disconnection.
- I reconnected and everything seemd fine, except in Zabbix I noticed the disk usage graph was increased
- long story short: when I umount the CIFS I noticed the "hidden" backup at the mount path (before connecting)! I deleted all backups there and mounted again: SOLVED
This same issue was on 2 of my 4 Cloudron Premium servers.
Thanks for the responses guys!
I realized that my backup server was unmounted after it bugged out the other day, and it was storing backups locally in
mnt/backups/snapshotsfolder. After purging that folder the disk space issue is resolved.
However! My cloudron instance is still inaccessible.
I followed all of the steps on this troubleshooting guide, to no avail.
After rebooting after cleaning up disk space, both
unboundwere in an error state.
unboundrestarted right away, but
nginxhad some issues with old certs preventing it from restarting. After purging the old certs (which I read was safe in the troubleshooting guide ),
nginxwas able to restart and is now running.
Unfortunately, my cloudron instance is still inaccessible and I'm not sure why. All other services mentioned in the troubleshooting guide are working properly (
box) according to the logs.
As far as I can tell everything is working properly, I just can't access my cloudron instance and don't know where to go from here. Any ideas for troubleshooting?
EDIT: Looks like
nginxjust died again for some reason. It restarted successfully once after I purged old certs, but now has the same error again even though the certs are gone.
This is the error it's giving me when I run
nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.myserver.net.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
@shan Delete the nginx config files as well and then
systemctl restart box. This will regenerate the nginx configs and cert files. After that, you will be able to access the dashboard. Go into each app's Location view and click save. That will regenerate the nginx config of each app.
(This tedious process is automated/fixed in next release.)
@girish I've deleted the
nginxconf file (
home/yellowtent/platformdata/nginx/nginx.conf) and am encountering a new error. It seems
systemctl restart boxdid not regenerate this.
[emerg] open() "/etc/nginx/nginx.conf" failed (2: No such file or directory)
@shan Oh, my bad. I should have been clearer that only app configs have to be deleted. Anyway, please run
/home/yellowtent/box/setup/start.shwhich will create nginx config files.
@girish that seems to have fixed all my problems! Can access the dashboard again. Looking forward to the next release when this is automated lol.
As the root cause of this was, that the backup was continuing even though the backup disk was not mounted, we were now able to find the bug which caused this and possibly other similar issues.
So the check itself for the mountpoint was correct, but this result was just ignored by the code. This oversight will be fixed for the next release and should avoid such cases for mounted backup volumes in the future.