out of space error leading to missing certs
-
@roofboard
Still unresolved after renaming the nginx.conf to old.nginx the error changed but still not working. - I am now restoring the old nginx.conf file.-- The job identifier is 2014. Jun 03 19:46:55 my.draglabs.com nginx[14813]: nginx: [emerg] open() "/etc/nginx/nginx.conf" failed (2: No such file or directory) Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE -- Subject: Unit process exited -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- An ExecStart= process belonging to unit nginx.service has exited. -- -- The process' exit code is 'exited' and its exit status is 1. Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- The unit nginx.service has entered the 'failed' state with result 'exit-code'. Jun 03 19:46:55 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. -- Subject: A start job for unit nginx.service has failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- A start job for unit nginx.service has finished with a failure. -- -- The job identifier is 2014 and the job result is failed. Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. -- Subject: Automatic restarting of a unit has been scheduled -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- Automatic restarting of the unit nginx.service has been scheduled, as the result for -- the configured Restart= setting for the unit. Jun 03 19:46:55 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. -- Subject: A stop job for unit nginx.service has finished -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- A stop job for unit nginx.service has finished. -- -- The job identifier is 2084 and the job result is done. Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly. Jun 03 19:46:55 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- The unit nginx.service has entered the 'failed' state with result 'exit-code'. Jun 03 19:46:55 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. -- Subject: A start job for unit nginx.service has failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- A start job for unit nginx.service has finished with a failure. -- -- The job identifier is 2084 and the job result is failed. Jun 03 19:46:56 my.draglabs.com systemd[11909]: var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- The unit UNIT has successfully entered the 'dead' state. Jun 03 19:46:56 my.draglabs.com systemd[1]: var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- The unit var-lib-docker-volumes-a5ffd80f\x2d5d66\x2d47ab\x2db651\x2d2bff60681a53\x2dlocalstorage-_data.mount has successfully entered the 'dead' state. Jun 03 19:47:04 my.draglabs.com kernel: Packet dropped: IN=ens3 OUT= MAC=00:00:50:d1:ef:23:fe:00:50:d1:ef:23:08:00 SRC=72.167.32.184 DST=80.209.239.35 LEN=40 TOS=0x00 PREC=0x00 TTL=238 ID=57427 PROTO=TCP SPT=56603 DPT=3389 WINDOW=1024 RES=0x00 SYN URGP=0 Jun 03 19:47:04 my.draglabs.com systemd[1]: systemd-timedated.service: Succeeded. -- Subject: Unit succeeded -- Defined-By: systemd -- Support: http://www.ubuntu.com/support --
-
Have you checked you've got a green light for everything under services? When you run of space unbound often needs to be restarted.
-
@jdaviescoates
Looks like unbound was not running, now it is but the ngnix.conf is still not repopulating, and ngnix will not start.root@my:/home/yellowtent/platformdata/nginx/old# systemctl status unbound ● unbound.service - Unbound DNS Resolver Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Fri 2022-06-03 20:26:54 UTC; 6min ago Main PID: 1028 (code=exited, status=1/FAILURE) Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Scheduled restart job, restart counter is at 5. Jun 03 20:26:54 my.draglabs.com systemd[1]: Stopped Unbound DNS Resolver. Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Start request repeated too quickly. Jun 03 20:26:54 my.draglabs.com systemd[1]: unbound.service: Failed with result 'exit-code'. Jun 03 20:26:54 my.draglabs.com systemd[1]: Failed to start Unbound DNS Resolver. root@my:/home/yellowtent/platformdata/nginx/old# unbound-anchor -a /var/lib/unbound/root.key root@my:/home/yellowtent/platformdata/nginx/old# systemctl restart unbound root@my:/home/yellowtent/platformdata/nginx/old# systemctl status unbound ● unbound.service - Unbound DNS Resolver Loaded: loaded (/etc/systemd/system/unbound.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2022-06-03 20:34:49 UTC; 14s ago Main PID: 14789 (unbound) Tasks: 1 (limit: 19105) Memory: 6.0M CGroup: /system.slice/unbound.service └─14789 /usr/sbin/unbound -d Jun 03 20:34:49 my.draglabs.com systemd[1]: Starting Unbound DNS Resolver... Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 0: subnet Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 1: validator Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] notice: init module 2: iterator Jun 03 20:34:49 my.draglabs.com unbound[14789]: [14789:0] info: start of service (unbound 1.9.4). Jun 03 20:34:49 my.draglabs.com systemd[1]: Started Unbound DNS Resolver. root@my:/home/yellowtent/platformdata/nginx/old# systemctl status nginx ● nginx.service - nginx - high performance web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─cloudron.conf Active: failed (Result: exit-code) since Fri 2022-06-03 20:33:30 UTC; 1min 56s ago Docs: http://nginx.org/en/docs/ Process: 14491 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=1/FAILURE) Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Jun 03 20:33:30 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly. Jun 03 20:33:30 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 20:33:30 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. root@my:/home/yellowtent/platformdata/nginx/old# systemctl restart nginx Job for nginx.service failed because the control process exited with error code. See "systemctl status nginx.service" and "journalctl -xe" for details. root@my:/home/yellowtent/platformdata/nginx/old# cd .. root@my:/home/yellowtent/platformdata/nginx# ls applications cert mime.types old root@my:/home/yellowtent/platformdata/nginx# re
-
@roofboard So, you just have to delete the app config files in
/etc/nginx/applications
and then runsystemctl restart nginx
andsystemctl restart box
.When you restart box, it will re-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app.
/etc/nginx/nginx.conf
should be:user www-data; # detect based on available CPU cores worker_processes auto; # this is 4096 by default. See /proc/<PID>/limits and /etc/security/limits.conf # usually twice the worker_connections (one for uptsream, one for downstream) # see also LimitNOFILE=16384 in systemd drop-in worker_rlimit_nofile 8192; pid /run/nginx.pid; events { # a single worker has these many simultaneous connections max worker_connections 4096; } http { include mime.types; default_type application/octet-stream; # required for long host names server_names_hash_bucket_size 128; access_log /var/log/nginx/access.log combined; sendfile on; # timeout for client to finish sending headers client_header_timeout 30s; # timeout for reading client request body (successive read timeout and not whole body!) client_body_timeout 60s; # keep-alive connections timeout in 65s. this is because many browsers timeout in 60 seconds keepalive_timeout 65s; # zones for rate limiting limit_req_zone $binary_remote_addr zone=admin_login:10m rate=10r/s; # 10 request a second include applications/*.conf; }
-
@girish
Ok, I restored the ngnix.conf file in the yellowtent then went and moved all the etc/ngnix/applications into a new folder called old. Did a restart and it is still not getting there...root@my:/etc/nginx/applications# ls 0fa72b5f-441d-4bef-bee3-665f4d85dc3e.conf 4b5dbf96-42b4-4a13-9b9f-15d5228dce9c.conf a1c46e70-b09e-419f-8461-3e8e40da3870.conf b3cbed12-eecc-42f2-93ba-b0834a3b3f5b.conf default.conf 1a907fb3-616a-4b71-930d-c132adc14357.conf 4eaa7fe2-9c72-46c7-946e-f7ed41891a72.conf a9948920-c8d0-4e14-9139-45ce8a78b549.conf b892da04-793f-4449-a6d4-ed8564455d46.conf e67529c6-edb3-47a5-890f-580adc2d7c61.conf 3d520625-8452-4e93-87c7-e03f89e4286b.conf 9cbc7dcd-5202-4e5f-9730-9491d8dc4077.conf abfd70d6-750a-4621-9072-82da26e9df8f.conf bdfaef04-4f9d-433e-aaf7-44e6146acb01.conf my.draglabs.com.conf root@my:/etc/nginx/applications# sudo mv *.conf old/ root@my:/etc/nginx/applications# ls old
@roofboard
when i try to start ngnix in one tab, and have journalctl -u nginx -fa in another tab this is the error that I am getting.Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4. Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 20:58:21 my.draglabs.com systemd[1]: Starting nginx - high performance web server... Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Jun 03 20:58:21 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly. Jun 03 20:58:21 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 20:58:21 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.```
-
@roofboard said in out of space error leading to missing certs:
Jun 03 20:58:21 my.draglabs.com nginx[22106]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY)
Some nginx config file is loading this file (it's under /etc/nginx/applications/*, you can move out all files there temporarily somewhere else) . Can you please check which one? That conf needs to be deleted and then nginx has to be restarted. The reason it's not starting is that most likely it is a 0 byte file.
-
@girish hmmmm
When I moved the conf files all the way out of the ngnix folder into /old then ran deleted the app config files in /etc/nginx/applications and ran run systemctl restart nginx and systemctl restart boxThen it momentarily started but cloudron would not load, I rebooted and tried to start ngnix using the command systemctl restart nginx
and below is the output from journalctl -u nginx -fa
Jun 03 21:15:12 my.draglabs.com nginx[12053]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 1. Jun 03 21:15:12 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 21:15:12 my.draglabs.com systemd[1]: Starting nginx - high performance web server... Jun 03 21:15:12 my.draglabs.com nginx[12062]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 21:15:12 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:12 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 2. Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server... Jun 03 21:15:13 my.draglabs.com nginx[12068]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 3. Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server... Jun 03 21:15:13 my.draglabs.com nginx[12070]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 4. Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: Starting nginx - high performance web server... Jun 03 21:15:13 my.draglabs.com nginx[12072]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Jun 03 21:15:13 my.draglabs.com systemd[1]: Stopped nginx - high performance web server. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Start request repeated too quickly. Jun 03 21:15:13 my.draglabs.com systemd[1]: nginx.service: Failed with result 'exit-code'. Jun 03 21:15:13 my.draglabs.com systemd[1]: Failed to start nginx - high performance web server.
-
@girish said in out of space error leading to missing certs:
/etc/nginx/applications/
The Key being referred to is definitely a zero byte file, also draglabs.com is the main domain to which I log in. if it possible that conf is regenerating pointers to the home/yellowtent/platformdata/nginx/cert/_.draglabs.com.key": ?
-
@girish said in out of space error leading to missing certs:
e-generate the nginx config for the dashboard alone. Once you have access to the dashboard, you can go to Location section of each app and click save. This will regenerate nginx config of the app
FIXED!!!
It is difficult to tell if deleting the conf files from the folder /etc/nginx/application and then restarting unbound Instructions then using systemctl restart nginx and systemctl restart box
I say that because unbound definitely was not working at at one point.
And as I remember nginx did start momentarily.However the solution came when I deleted the corrupted zero byte private key from the folder /home/yellowtent/platformdata/nginx/cert/
When that file was deleted I was able to log in without ssl using firefox. Once in under the domains and certs section of cloudron I was able to click on Renew All Certs. That fixed SSL, and I was able to go into each program and re assign the dns settings by clicking save.
-
-
-
@girish there is no way to trigger certificate renewal over the (SSH) console?
I had a bug (a couple months ago) I never reported where stopped apps did not get a new cert and nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot. Fixed it by just copying over current cert files from working (non stopped) apps. They where obviously non valid for those stopped apps but I was able to start nginx, start the stopped apps and renew their certs.
So in short: Would be nice to have a way to trigger cert renewal over console command and/or extend the troubleshoot guide with cert related stuff.
-
Also this whole issue was caused by running out of space - I took a look at some of the other posts on out of space crashes and can tell it is a difficult problem to solve.
Supposedly there is a running out of space warning but i never got that warning.
I was thinking that a good solution for the running out of space error would involve taking the remaining space cron which calculates remaining space every 'n' minutes and integrating it over 'x' hours to arrive at time to disk full.
This could relatively accurately predict if an out of space crash is pending or imminent - and if so... do things like stop processes prevent backup (if backing up to local filesystem) etc.
Essentially
- predict the crash with a pinch of calculus.
- send a warning to the administrator.
- follow a contingency to protect the sever.
Because I could imagine many ways this could happen, and my example is ONLY one way. A program can crash Cloudron I could have been copying video files, It could have been NextCloud, a spam attack on a mailserver.
-
@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).
I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.
-
@subven said in out of space error leading to missing certs:
nginx failed to launch because of outdated/non valid certs making Cloudron brake (no nginx --> no dashboard) on system reboot.
Yes, indeed, this is a bug. As @roofboard also found out, the code check is a cert file exists but not if it's corrupt. I will get this fixed, so at the very least, restarting the box code will get the dashboard back up.
-
@girish said in out of space error leading to missing certs:
only every 6 hours or so
The predictive aspect of @roofboard's suggestion is also a good one by tracking a bit of the rate of change, perhaps speeding up in frequency as we approach higher thresholds (>80%+) and slowing down when out of the danger zone(<80%).
Combining this with an email to the admin which is more likely to be seen than a UI notification would be great, until we add the external mobile notification integration via external messaging services.. which is in the pipeline.
-
@girish said in out of space error leading to missing certs:
@roofboard yes, agreed. I don't like it the way it currently right now that filling up disk space brings everything down. Currently, we have a simple cron checker which will give alerts if it's nearing some amount of disk space but this fails in many cases because it runs only every 6 hours or so (it's not run too often to prevent disk churn).
I think a good long term solution is to figure out how to limit disk usage of apps. I think another thread there is a idea that maybe all appdata can be stored in a XFS partition. We can then enforce quotas on apps.
A good shorter term solution would be to allow to configure the level below which the alert is sent. Depending on if you use your server for storing text files, or if you download video, your "low disk" tolerance will be wildly different.
-
-
-
@subven said in out of space error leading to missing certs:
@girish there is no way to trigger certificate renewal over the (SSH) console?
I'd like an answer to this question.. as I just ran into the missing cert problem too.
Having deleted all the conf/cert files, and gotten nginx started, the UI is still not accessible after box restart. All apps are inaccessible too.
box restart seems to recreate the
/etc/nginx/applications/my.domain.conf
BUT doesn't check if the/home/yellowtent/platformdata/nginx/certs/my.domain.host.cert
is there.How are they regenerated from the CLI?
-