Dashboard and applications unreachable after recovery from full disk
-
What happened:
The remote backup CIFS storage got disconnected. Backups continued on local storage and it eventually filled up the disk. Applications started to fail.What I did so far
I removed the local backup images manually to free up disk space. The notification "reboot required" was due so I rebooted via the dashboard. The server rebooted. The dashboard and applications are unreachable. The reason: nginx fails to start.I tried
/home/yellowtent/box/setup/start.sh
to no success. nginx still failed.
I I deleted all nginx cofig files and did the start.sh again, rebooted. Still nginx fails.
I used
apt-get remove dpkg --list 'linux-image*' |grep ^ii | awk '{print $2}'\ | grep -v \uname -r
and then
/home/yellowtent/box/setup/start.sh
again. Still nginx failed to start.
After a reboot nginx is running now. Yet I still can not reach the dashboard or any other application. I don´t know how to proceed.
The output of
systemctl status box
is
● box.service - Cloudron Admin Loaded: loaded (/etc/systemd/system/box.service; enabled; vendor preset: enabled) Active: active (running) since Fri 2022-08-19 09:04:43 UTC; 19min ago Main PID: 997 (node) Tasks: 11 (limit: 9443) Memory: 69.4M (max: 400.0M) CGroup: /system.slice/box.service └─997 node /home/yellowtent/box/box.js Aug 19 09:04:48 1001364-346 sudo[2454]: pam_unix(sudo:session): session closed for user root Aug 19 09:04:48 1001364-346 sudo[2471]: yellowtent : unable to resolve host 1001364-346 Aug 19 09:04:48 1001364-346 sudo[2471]: pam_unix(sudo:session): session opened for user root by (uid=0) Aug 19 09:04:55 1001364-346 sudo[2471]: pam_unix(sudo:session): session closed for user root Aug 19 09:04:55 1001364-346 sudo[3672]: yellowtent : unable to resolve host 1001364-346 Aug 19 09:04:55 1001364-346 sudo[3672]: pam_unix(sudo:session): session opened for user root by (uid=0) Aug 19 09:04:55 1001364-346 sudo[3672]: pam_unix(sudo:session): session closed for user root Aug 19 09:05:25 1001364-346 sudo[4561]: yellowtent : unable to resolve host 1001364-346 Aug 19 09:05:25 1001364-346 sudo[4561]: pam_unix(sudo:session): session opened for user root by (uid=0) Aug 19 09:05:25 1001364-346 sudo[4561]: pam_unix(sudo:session): session closed for user root root@1001364-346:~#
-
@whitespace What does sudo systemctl status nginx says ?
-
● nginx.service - nginx - high performance web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─cloudron.conf Active: active (running) since Fri 2022-08-19 09:04:42 UTC; 24min ago Docs: http://nginx.org/en/docs/ Main PID: 822 (nginx) Tasks: 3 (limit: 9443) Memory: 6.7M CGroup: /system.slice/nginx.service ├─822 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf ├─825 nginx: worker process └─826 nginx: worker process Aug 19 09:04:41 1001364-346 systemd[1]: Starting nginx - high performance web server... Aug 19 09:04:42 1001364-346 systemd[1]: nginx.service: Can't open PID file /run/nginx.pid (yet?) after start: Operation not permitted Aug 19 09:04:42 1001364-346 systemd[1]: Started nginx - high performance web server.
-
Nginx failed again
here is the output of
systemctl status nginx
nginx.service - nginx - high performance web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─cloudron.conf Active: failed (Result: exit-code) since Fri 2022-08-19 09:42:58 UTC; 13s ago Docs: http://nginx.org/en/docs/ Process: 14162 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=1/FAILURE) Aug 19 09:42:57 1001364-346 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Aug 19 09:42:57 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. Aug 19 09:42:57 1001364-346 systemd[1]: Failed to start nginx - high performance web server. Aug 19 09:42:58 1001364-346 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Aug 19 09:42:58 1001364-346 systemd[1]: Stopped nginx - high performance web server. Aug 19 09:42:58 1001364-346 systemd[1]: nginx.service: Start request repeated too quickly. Aug 19 09:42:58 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. Aug 19 09:42:58 1001364-346 systemd[1]: Failed to start nginx - high performance web server.
-
In addition here the output of
journalctl -xe
Automatic restarting of the unit nginx.service has been scheduled, as the result for -- the configured Restart= setting for the unit. Aug 19 09:44:44 1001364-346 systemd[1]: Stopped nginx - high performance web server. -- Subject: A stop job for unit nginx.service has finished -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- A stop job for unit nginx.service has finished. -- -- The job identifier is 2293 and the job result is done. Aug 19 09:44:44 1001364-346 systemd[1]: nginx.service: Start request repeated too quickly. Aug 19 09:44:44 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. -- Subject: Unit failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- The unit nginx.service has entered the 'failed' state with result 'exit-code'. Aug 19 09:44:44 1001364-346 systemd[1]: Failed to start nginx - high performance web server. -- Subject: A start job for unit nginx.service has failed -- Defined-By: systemd -- Support: http://www.ubuntu.com/support -- -- A start job for unit nginx.service has finished with a failure. -- -- The job identifier is 2293 and the job result is failed.
-
@whitespace have you tried @robi link already ?
deleting the app config files and hitting the save button on the "location" view in the UI to regenerated nginx config for each app ? -
@rmdes Yes, I have tried @robi s link. Deleted all .conf files and certs in nginx/applications and nginx/certs, restarted unbound, restarted nginx, restarted box
As i can not reach the UI it is impossible for me to invoke recreation of certs. I am only able to reach the server via ssh. The dashboard is not working.
-
Then please gather the restart portion of nginx, and box logs so we can see what the actual failure is.
Is box regenerating the nginx configs after restart?
If so, post the config file that is causing the failure to start.Alternatively you can email support@ for additional assistance.
-
@whitespace I had a similar issue yesterday, please post the needed logs so we can see the real error message for further assistance.
Or like @robi suggested write an E-Mail to support@cloudron.io
-
Output of
systectl status nginx.service
is
● nginx.service - nginx - high performance web server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─cloudron.conf Active: failed (Result: exit-code) since Fri 2022-08-19 10:18:19 UTC; 6s ago Docs: http://nginx.org/en/docs/ Process: 5047 ExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf (code=exited, status=1/FAILURE) Aug 19 10:18:19 1001364-346 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Aug 19 10:18:19 1001364-346 systemd[1]: Stopped nginx - high performance web server. Aug 19 10:18:19 1001364-346 systemd[1]: nginx.service: Start request repeated too quickly. Aug 19 10:18:19 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. Aug 19 10:18:19 1001364-346 systemd[1]: Failed to start nginx - high performance web server.
Is box regenerating the nginx configs after restart?
It seems so. They appear inside the folder and have the expected default values.
box.log output after
systemctl box restart
is
2022-08-19T10:26:05.740Z box:server ========================================== 2022-08-19T10:26:05.748Z box:server Cloudron 7.2.5 2022-08-19T10:26:05.748Z box:server ========================================== 2022-08-19T10:26:05.794Z box:settings initCache: pre-load settings 2022-08-19T10:26:05.808Z box:tasks stopAllTasks: stopping all tasks 2022-08-19T10:26:05.808Z box:shell stopTask spawn: /usr/bin/sudo -S /home/yellowtent/box/src/scripts/stoptask.sh all 2022-08-19T10:26:05.820Z box:shell stopTask (stdout): sudo: unable to resolve host 1001364-346: Name or service not known Cloudron is up and running. Logs are at /home/yellowtent/platformdata/logs/box.log 2022-08-19T10:26:05.864Z box:shell removeCollectdProfile spawn: /usr/bin/sudo -S /home/yellowtent/box/src/scripts/configurecollectd.sh remove cloudron-backup 2022-08-19T10:26:05.871Z box:shell removeCollectdProfile (stdout): sudo: unable to resolve host 1001364-346: Name or service not known 2022-08-19T10:26:05.875Z box:shell removeCollectdProfile (stdout): Restarting collectd 2022-08-19T10:26:05.939Z box:shell removeCollectdProfile (stdout): Removing collectd stats of cloudron-backup 2022-08-19T10:26:05.946Z box:reverseproxy writeDashboardConfig: writing admin config for myserver.com Can't open /home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert for reading, No such file or directory 139761371608384:error:02001002:system library:fopen:No such file or directory:../crypto/bio/bss_file.c:69:fopen('/home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert','r') 139761371608384:error:2006D080:BIO routines:BIO_new_file:no such file:../crypto/bio/bss_file.c:76: unable to load certificate 2022-08-19T10:26:05.969Z box:shell reload spawn: /usr/bin/sudo -S /home/yellowtent/box/src/scripts/restartservice.sh nginx 2022-08-19T10:26:05.978Z box:shell reload (stdout): sudo: unable to resolve host 1001364-346: Name or service not known 2022-08-19T10:26:05.996Z box:shell reload (stdout): nginx: [emerg] cannot load certificate "/home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file) 2022-08-19T10:26:05.996Z box:shell reload code: 1, signal: null 2022-08-19T10:26:05.997Z box:cloudron Startup task at index 2 failed: Error reloading nginx: reload exited with code 1 signal null 2022-08-19T10:26:06.014Z box:cloudron onActivated: running post activation tasks 2022-08-19T10:26:06.014Z box:platform initializing addon infrastructure 2022-08-19T10:26:06.015Z box:platform platform is uptodate at version 49.0.0 2022-08-19T10:26:06.015Z box:platform onPlatformReady: platform is ready. infra changed: false 2022-08-19T10:26:06.015Z box:apps schedulePendingTasks: scheduling app tasks 2022-08-19T10:26:06.024Z box:cron startJobs: starting cron jobs 2022-08-19T10:26:06.035Z box:cron backupConfigChanged: schedule 00 00 3,23 * * 0 (Europe/Berlin) 2022-08-19T10:26:06.038Z box:cron autoupdatePatternChanged: pattern - 00 00 1,3,5,23 * * 1,3,4,6 (Europe/Berlin) 2022-08-19T10:26:06.040Z box:cron Dynamic DNS setting changed to false 2022-08-19T10:26:06.041Z box:dockerproxy startDockerProxy: started proxy on port 3003 2022-08-19T10:26:10.217Z box:apphealthmonitor app health: 7 alive / 1 dead. 2022-08-19T10:26:20.155Z box:apphealthmonitor app health: 7 alive / 1 dead. 2022-08-19T10:26:30.190Z box:apphealthmonitor app health: 7 alive / 1 dead.
-
@whitespace that is not an nginx log. See how it says systemd[1]?
-
journalctl -u nginx -fa
gives me
Aug 19 10:50:42 1001364-346 systemd[1]: Starting nginx - high performance web server... Aug 19 10:50:42 1001364-346 nginx[11878]: nginx: [emerg] cannot load certificate key "/home/yellowtent/platformdata/nginx/cert/_.myserver.com.key": PEM_read_bio_PrivateKey() failed (SSL: error:0909006C:PEM routines:get_name:no start line:Expecting: ANY PRIVATE KEY) Aug 19 10:50:42 1001364-346 systemd[1]: nginx.service: Control process exited, code=exited, status=1/FAILURE Aug 19 10:50:42 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. Aug 19 10:50:42 1001364-346 systemd[1]: Failed to start nginx - high performance web server. Aug 19 10:50:42 1001364-346 systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. Aug 19 10:50:42 1001364-346 systemd[1]: Stopped nginx - high performance web server. Aug 19 10:50:42 1001364-346 systemd[1]: nginx.service: Start request repeated too quickly. Aug 19 10:50:42 1001364-346 systemd[1]: nginx.service: Failed with result 'exit-code'. Aug 19 10:50:42 1001364-346 systemd[1]: Failed to start nginx - high performance web server.
-
@robi said in Dashboard and applications unreachable after recovery from full disk:
@whitespace that is not an nginx log. See how it says systemd[1]?
But it includes the nginx errors
2022-08-19T10:26:05.996Z box:shell reload (stdout): nginx: [emerg] cannot load certificate "/home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory:fopen('/home/yellowtent/platformdata/nginx/cert/myserver.com.host.cert','r') error:2006D080:BIO routines:BIO_new_file:no such file)
Nginx has missing certificates. @nebulon fixe this yesterday for one of my instances.
It might be best to send a message to the support. -
@whitespace from another message in the log, it doesn't appear that the system knows it's own hostname.
Please make sure to set one so it sticks after reboot.
@BrutalBirdie yes, I missed that from the long scroll to the right.
-
@whitespace I replied to you from support@. We have to delete the nginx config file and also the bad key/cert and then "systemctl restart box". This will re-generate the config file and the certs.
In next release, this is done automatically. We check if the key/cert are bad files and re-sync things to disk automatically.
-
Resolved in support@
-
-
-
Thank you very much for resolving the issue. I would love to understand what i did wrong.
I tried deleting the nginx.conf and certs and let them be regenerated as you have described in other threads. Yet I did not manage to get nginx back up and running. Are there additional steps besides deleting cofings, restarting box, nginx and unbound?
Muchas Gracias for saving the day!