out of space error leading to missing certs

girish

@roofboard So, you just have to delete the app config files in /etc/nginx/applications and then run systemctl restart nginx and systemctl restart box.

When you restart box, it will re-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app.

/etc/nginx/nginx.conf should be:

user www-data;

# detect based on available CPU cores
worker_processes  auto;

# this is 4096 by default. See /proc/<PID>/limits and /etc/security/limits.conf
# usually twice the worker_connections (one for uptsream, one for downstream)
# see also LimitNOFILE=16384 in systemd drop-in
worker_rlimit_nofile 8192;

pid /run/nginx.pid;

events {
    # a single worker has these many simultaneous connections max
    worker_connections  4096;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    # required for long host names
    server_names_hash_bucket_size 128;

    access_log /var/log/nginx/access.log combined;

    sendfile        on;

    # timeout for client to finish sending headers
    client_header_timeout 30s;

    # timeout for reading client request body (successive read timeout and not whole body!)
    client_body_timeout 60s;

    # keep-alive connections timeout in 65s. this is because many browsers timeout in 60 seconds
    keepalive_timeout  65s;

    # zones for rate limiting
    limit_req_zone $binary_remote_addr zone=admin_login:10m rate=10r/s; # 10 request a second

    include applications/*.conf;
}

girish

Also, the nginx configs (and certs) are "throw away". The code is written so that they can be re-generated from the configs in the database.

roofboard

@girish
Ok, I restored the ngnix.conf file in the yellowtent then went and moved all the etc/ngnix/applications into a new folder called old. Did a restart and it is still not getting there...

root@my:/etc/nginx/applications# ls
0fa72b5f-441d-4bef-bee3-665f4d85dc3e.conf  4b5dbf96-42b4-4a13-9b9f-15d5228dce9c.conf  a1c46e70-b09e-419f-8461-3e8e40da3870.conf  b3cbed12-eecc-42f2-93ba-b0834a3b3f5b.conf  default.conf
1a907fb3-616a-4b71-930d-c132adc14357.conf  4eaa7fe2-9c72-46c7-946e-f7ed41891a72.conf  a9948920-c8d0-4e14-9139-45ce8a78b549.conf  b892da04-793f-4449-a6d4-ed8564455d46.conf  e67529c6-edb3-47a5-890f-580adc2d7c61.conf
3d520625-8452-4e93-87c7-e03f89e4286b.conf  9cbc7dcd-5202-4e5f-9730-9491d8dc4077.conf  abfd70d6-750a-4621-9072-82da26e9df8f.conf  bdfaef04-4f9d-433e-aaf7-44e6146acb01.conf  my.draglabs.com.conf
root@my:/etc/nginx/applications# sudo mv *.conf old/
root@my:/etc/nginx/applications# ls
old

@roofboard
when i try to start ngnix in one tab, and have journalctl -u nginx -fa in another tab this is the error that I am getting.

Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.```

girish

@roofboard said in out of space error leading to missing certs:

Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)

Some nginx config file is loading this file (it's under /etc/nginx/applications/*, you can move out all files there temporarily somewhere else) . Can you please check which one? That conf needs to be deleted and then nginx has to be restarted. The reason it's not starting is that most likely it is a 0 byte file.

roofboard

@girish
some config file located in the folder /etc/nginx/application ?

If yes then can it see into the old file where i put all the old config files?

roofboard

@girish hmmmm
When I moved the conf files all the way out of the ngnix folder into /old then ran deleted the app config files in /etc/nginx/applications and ran run systemctl restart nginx and systemctl restart box

Then it momentarily started but cloudron would not load, I rebooted and tried to start ngnix using the command systemctl restart nginx

and below is the output from journalctl -u nginx -fa

Jun 03 21:15:12 my.draglabs.com nginx[12053]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 1.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:12 my.draglabs.com nginx[12062]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 2.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12068]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 3.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12070]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server...
Jun 03 21:15:13 my.draglabs.com nginx[12072]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly.
Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'.
Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.

roofboard

@girish said in out of space error leading to missing certs:

/etc/nginx/applications/

The Key being referred to is definitely a zero byte file, also draglabs.com is the main domain to which I log in. if it possible that conf is regenerating pointers to the home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": ?

roofboard

@girish said in out of space error leading to missing certs:

e-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app

FIXED!!!

It is difficult to tell if deleting the conf files from the folder /etc/nginx/application and then restarting unbound Instructions then using systemctl restart nginx and systemctl restart box

I say that because unbound definitely was not working at at one point.
And as I remember nginx did start momentarily.

However the solution came when I deleted the corrupted zero byte private key from the folder /home/yellowtent/platformdata/nginx/cert/

When that file was deleted I was able to log in without ssl using firefox. Once in under the domains and certs section of cloudron I was able to click on Renew All Certs. That fixed SSL, and I was able to go into each program and re assign the dns settings by clicking save.

scooke

@roofboard Phew! Good work persisting, and thanks for sharing the solution.

subven

@girish there is no way to trigger certificate renewal over the (SSH) console?

I had a bug (a couple months ago) I never reported where stopped apps did not get a new cert and nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot. Fixed it by just copying over current cert files from working (non stopped) apps. They where obviously non valid for those stopped apps but I was able to start nginx, start the stopped apps and renew their certs.

So in short: Would be nice to have a way to trigger cert renewal over console command and/or extend the troubleshoot guide with cert related stuff.

roofboard

@girish

Also this whole issue was caused by running out of space - I took a look at some of the other posts on out of space crashes and can tell it is a difficult problem to solve.

Supposedly there is a running out of space warning but i never got that warning.

I was thinking that a good solution for the running out of space error would involve taking the remaining space cron which calculates remaining space every 'n' minutes and integrating it over 'x' hours to arrive at time to disk full.

This could relatively accurately predict if an out of space crash is pending or imminent - and if so... do things like stop processes prevent backup (if backing up to local filesystem) etc.

Essentially

predict the crash with a pinch of calculus.
send a warning to the administrator.
follow a contingency to protect the sever.

Because I could imagine many ways this could happen, and my example is ONLY one way. A program can crash Cloudron I could have been copying video files, It could have been NextCloud, a spam attack on a mailserver.

girish

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

girish

@subven said in out of space error leading to missing certs:

nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot.

Yes, indeed, this is a bug. As @roofboard also found out, the code check is a cert file exists but not if it's corrupt. I will get this fixed, so at the very least, restarting the box code will get the dashboard back up.

robi

@girish said in out of space error leading to missing certs:

only every 6 hours or so

The predictive aspect of @roofboard's suggestion is also a good one by tracking a bit of the rate of change, perhaps speeding up in frequency as we approach higher thresholds (>80%+) and slowing down when out of the danger zone(<80%).

Combining this with an email to the admin which is more likely to be seen than a UI notification would be great, until we add the external mobile notification integration via external messaging services.. which is in the pipeline.

mehdi

@girish said in out of space error leading to missing certs:

@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).

I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.

A good shorter term solution would be to allow to configure the level below which the alert is sent. Depending on if you use your server for storing text files, or if you download video, your "low disk" tolerance will be wildly different.

roofboard

@mehdi

Yah... Maybe it would be enough just to give the administrator a few controls.

the cron interval
the warning level
the kill level (kill all modules)

robi

@subven said in out of space error leading to missing certs:

@girish there is no way to trigger certificate renewal over the (SSH) console?

I'd like an answer to this question.. as I just ran into the missing cert problem too.

Having deleted all the conf/cert files, and gotten nginx started, the UI is still not accessible after box restart. All apps are inaccessible too.

box restart seems to recreate the /etc/nginx/applications/my.domain.conf BUT doesn't check if the /home/yellowtent/platformdata/nginx/certs/my.domain.host.cert is there.

How are they regenerated from the CLI?

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

out of space error leading to missing certs