OH OH. First Vikunja was failing with an update due to something wrong with nginx, now the entire Cloudron is down.
-
Here is the error log for the initial Vikunja fail:
ay 08 09:47:47 box:apptask run: app error for state pending_restore: BoxError: Error reloading nginx: reload exited with code 1 signal null at reload (/home/yellowtent/box/src/reverseproxy.js:188:22) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5) at async install (/home/yellowtent/box/src/apptask.js:271:5) { reason: 'Nginx Error', details: {} } May 08 09:47:47 box:taskworker Task took 0.464 seconds May 08 09:47:47 box:tasks setCompleted - 18797: {"result":null,"error":{"stack":"BoxError: Error reloading nginx: reload exited with code 1 signal null\n at reload (/home/yellowtent/box/src/reverseproxy.js:188:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5)\n at async install (/home/yellowtent/box/src/apptask.js:271:5)","name":"BoxError","reason":"Nginx Error","details":{},"message":"Error reloading nginx: reload exited with code 1 signal null"}} May 08 09:47:47 box:tasks update 18797: {"percent":100,"result":null,"error":{"stack":"BoxError: Error reloading nginx: reload exited with code 1 signal null\n at reload (/home/yellowtent/box/src/reverseproxy.js:188:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5)\n at async install (/home/yellowtent/box/src/apptask.js:271:5)","name":"BoxError","reason":"Nginx Error","details":{},"message":"Error reloading nginx: reload exited with code 1 signal null"}} [no timestamp] Error reloading nginx: reload exited with code 1 signal null [no timestamp] at reload (/home/yellowtent/box/src/reverseproxy.js:188:22) [no timestamp] at process.processTicksAndRejections (node:internal/process/task_queues:95:5) [no timestamp] at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5) [no timestamp] at async install (/home/yellowtent/box/src/apptask.js:271:5) May 08 09:48:22 box:taskworker Starting task 18798. Logs are at /home/yellowtent/platformdata/logs/51667508-ab88-4d9d-ba18-be51560acaad/apptask.log May 08 09:48:22 box:apptask run: startTask installationState: pending_restore runState: running May 08 09:48:22 box:tasks update 18798: {"percent":10,"message":"Cleaning up old install"} May 08 09:48:22 box:shell reload /usr/bin/sudo -S /home/yellowtent/box/src/scripts/restartservice.sh nginx May 08 09:48:23 box:shell reload: /usr/bin/sudo -S /home/yellowtent/box/src/scripts/restartservice.sh nginx errored BoxError: reload exited with code 1 signal null [no timestamp] at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:110:19) [no timestamp] at ChildProcess.emit (node:events:513:28) [no timestamp] at ChildProcess._handle.onexit (node:internal/child_process:291:12) { [no timestamp] reason: 'Shell Error', [no timestamp] details: {}, [no timestamp] code: 1, [no timestamp] signal: null [no timestamp] }
Now that the entire Cloudron is down (I tried to reboot the entire thing from the Cloudron dashboard, thinking it might bump the nginx problem in health...oops).
No others apps seemed to be down. In fact, I had just upgraded one.
The message upon logging into the server includes:
=> There is 1 zombie process. Expanded Security Maintenance for Applications is not enabled. 71 updates can be applied immediately. To see these additional updates run: apt list --upgradable 1 additional security update can be applied with ESM Apps. Learn more about enabling ESM Apps service at https://ubuntu.com/esm
I rebooted the entire VPS next while logged in via ssh, and still get no response. Checking unbound says its fine. Checking nginx though:
May 08 08:38:56 my.toutdo.com systemd[1]: Starting Unbound DNS Resolver... May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 0: subnet May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 1: validator May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 2: iterator May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] info: start of service (unbound 1.13.1). May 08 08:38:57 my.toutdo.com systemd[1]: Started Unbound DNS Resolver. May 08 08:38:58 my.toutdo.com unbound[795]: [795:0] info: generate keytag query _ta-4f66. NULL IN root@my:~# systemctl status nginx × nginx.service - A high performance web server and a reverse proxy server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/nginx.service.d └─cloudron.conf Active: failed (Result: exit-code) since Wed 2024-05-08 08:39:49 UTC; 1s ago Docs: man:nginx(8) Process: 15865 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=1/FAILURE) CPU: 298ms May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. May 08 08:39:49 my.toutdo.com systemd[1]: Stopped A high performance web server and a reverse proxy server. May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Start request repeated too quickly. May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Failed with result 'exit-code'. May 08 08:39:49 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server.
When I run
systemctl restart docker
nginx doesn't appear as running.And finally, running
journalctl -xeu nginx.service
gives me:root@my:~# journalctl -xeu nginx.service ░░ ░░ The unit nginx.service has entered the 'failed' state with result 'exit-code'. May 08 08:45:45 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server. ░░ Subject: A start job for unit nginx.service has failed ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A start job for unit nginx.service has finished with a failure. ░░ ░░ The job identifier is 3976 and the job result is failed. May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5. ░░ Subject: Automatic restarting of a unit has been scheduled ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ Automatic restarting of the unit nginx.service has been scheduled, as the result for ░░ the configured Restart= setting for the unit. May 08 08:45:46 my.toutdo.com systemd[1]: Stopped A high performance web server and a reverse proxy server. ░░ Subject: A stop job for unit nginx.service has finished ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A stop job for unit nginx.service has finished. ░░ ░░ The job identifier is 4051 and the job result is done. May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Start request repeated too quickly. May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Failed with result 'exit-code'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ The unit nginx.service has entered the 'failed' state with result 'exit-code'. May 08 08:45:46 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server. ░░ Subject: A start job for unit nginx.service has failed ░░ Defined-By: systemd ░░ Support: http://www.ubuntu.com/support ░░ ░░ A start job for unit nginx.service has finished with a failure. ░░ ░░ The job identifier is 4051 and the job result is failed.
Hopefully there is a fix. Thank you!
-
did you check the disk space?
-
@BrutalBirdie Yes, checked and disk space is fine.
I did subsequently find this post - https://forum.cloudron.io/topic/6506/after-a-reboot-nginx-does-not-start/4, and found a totally separate app with a borked nginx conf, so I deleted those, and then the Cloudron started back up fine. (EDIT: turns out this app had been stopped, which is a question girish asked in https://forum.cloudron.io/post/43502.)
I was then able to restore to a Vikunja backup from a few days ago to get it up and running.
I will now try to upgrade the Vikunja app again to see what's happening. It is possible it's own upgrade failed because of nginx restarting too quickly due to the other app's borkedness.
-
All fixed. It appears it was the nginx conf of the stopped app which was throwing off nginx and causing an update to fail, then the restore to fail, then the entire Cloudron to fail. Following the suggestions in https://forum.cloudron.io/topic/6506/after-a-reboot-nginx-does-not-start/4 helped me get it all going again.
I'll leave this up though for reference.
-
-