out of space error leading to missing certs

roofboard

@girish said in out of space error leading to missing certs:

/etc/nginx/applications/

The Key being referred to is definitely a zero byte file, also draglabs.com is the main domain to which I log in. if it possible that conf is regenerating pointers to the home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": ?

roofboard

@girish said in out of space error leading to missing certs:

e-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app

FIXED!!!

It is difficult to tell if deleting the conf files from the folder /etc/nginx/application and then restarting unbound Instructions then using systemctl restart nginx and systemctl restart box

I say that because unbound definitely was not working at at one point.
And as I remember nginx did start momentarily.

However the solution came when I deleted the corrupted zero byte private key from the folder /home/yellowtent/platformdata/nginx/cert/

When that file was deleted I was able to log in without ssl using firefox. Once in under the domains and certs section of cloudron I was able to click on Renew All Certs. That fixed SSL, and I was able to go into each program and re assign the dns settings by clicking save.

scooke

@roofboard Phew! Good work persisting, and thanks for sharing the solution.

subven

@girish there is no way to trigger certificate renewal over the (SSH) console?

I had a bug (a couple months ago) I never reported where stopped apps did not get a new cert and nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot. Fixed it by just copying over current cert files from working (non stopped) apps. They where obviously non valid for those stopped apps but I was able to start nginx, start the stopped apps and renew their certs.

So in short: Would be nice to have a way to trigger cert renewal over console command and/or extend the troubleshoot guide with cert related stuff.

roofboard

@girish

Also this whole issue was caused by running out of space - I took a look at some of the other posts on out of space crashes and can tell it is a difficult problem to solve.

Supposedly there is a running out of space warning but i never got that warning.

I was thinking that a good solution for the running out of space error would involve taking the remaining space cron which calculates remaining space every 'n' minutes and integrating it over 'x' hours to arrive at time to disk full.

This could relatively accurately predict if an out of space crash is pending or imminent - and if so... do things like stop processes prevent backup (if backing up to local filesystem) etc.

Essentially

predict the crash with a pinch of calculus.
send a warning to the administrator.
follow a contingency to protect the sever.

Because I could imagine many ways this could happen, and my example is ONLY one way. A program can crash Cloudron I could have been copying video files, It could have been NextCloud, a spam attack on a mailserver.

girish

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

girish

@subven said in out of space error leading to missing certs:

nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot.

Yes, indeed, this is a bug. As @roofboard also found out, the code check is a cert file exists but not if it's corrupt. I will get this fixed, so at the very least, restarting the box code will get the dashboard back up.

robi

@girish said in out of space error leading to missing certs:

only every 6 hours or so

The predictive aspect of @roofboard's suggestion is also a good one by tracking a bit of the rate of change, perhaps speeding up in frequency as we approach higher thresholds (>80%+) and slowing down when out of the danger zone(<80%).

Combining this with an email to the admin which is more likely to be seen than a UI notification would be great, until we add the external mobile notification integration via external messaging services.. which is in the pipeline.

mehdi

@girish said in out of space error leading to missing certs:

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

A good shorter term solution would be to allow to configure the level below which the alert is sent. Depending on if you use your server for storing text files, or if you download video, your "low disk" tolerance will be wildly different.

roofboard

@mehdi

Yah... Maybe it would be enough just to give the administrator a few controls.

the cron interval
the warning level
the kill level (kill all modules)

robi

@subven said in out of space error leading to missing certs:

@girish there is no way to trigger certificate renewal over the (SSH) console?

I'd like an answer to this question.. as I just ran into the missing cert problem too.

Having deleted all the conf/cert files, and gotten nginx started, the UI is still not accessible after box restart. All apps are inaccessible too.

box restart seems to recreate the /etc/nginx/applications/my.domain.conf BUT doesn't check if the /home/yellowtent/platformdata/nginx/certs/my.domain.host.cert is there.

How are they regenerated from the CLI?

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

out of space error leading to missing certs