OH OH. First Vikunja was failing with an update due to something wrong with nginx, now the entire Cloudron is down.

scooke

Here is the error log for the initial Vikunja fail:

ay 08 09:47:47 box:apptask run: app error for state pending_restore: BoxError: Error reloading nginx: reload exited with code 1 signal null at reload (/home/yellowtent/box/src/reverseproxy.js:188:22) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5) at async install (/home/yellowtent/box/src/apptask.js:271:5) { reason: 'Nginx Error', details: {} }
May 08 09:47:47 box:taskworker Task took 0.464 seconds
May 08 09:47:47 box:tasks setCompleted - 18797: {"result":null,"error":{"stack":"BoxError: Error reloading nginx: reload exited with code 1 signal null\n at reload (/home/yellowtent/box/src/reverseproxy.js:188:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5)\n at async install (/home/yellowtent/box/src/apptask.js:271:5)","name":"BoxError","reason":"Nginx Error","details":{},"message":"Error reloading nginx: reload exited with code 1 signal null"}}
May 08 09:47:47 box:tasks update 18797: {"percent":100,"result":null,"error":{"stack":"BoxError: Error reloading nginx: reload exited with code 1 signal null\n at reload (/home/yellowtent/box/src/reverseproxy.js:188:22)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5)\n at async install (/home/yellowtent/box/src/apptask.js:271:5)","name":"BoxError","reason":"Nginx Error","details":{},"message":"Error reloading nginx: reload exited with code 1 signal null"}}
[no timestamp]  Error reloading nginx: reload exited with code 1 signal null
[no timestamp]  at reload (/home/yellowtent/box/src/reverseproxy.js:188:22)
[no timestamp]  at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
[no timestamp]  at async Object.unconfigureApp (/home/yellowtent/box/src/reverseproxy.js:607:5)
[no timestamp]  at async install (/home/yellowtent/box/src/apptask.js:271:5)
May 08 09:48:22 box:taskworker Starting task 18798. Logs are at /home/yellowtent/platformdata/logs/51667508-ab88-4d9d-ba18-be51560acaad/apptask.log
May 08 09:48:22 box:apptask run: startTask installationState: pending_restore runState: running
May 08 09:48:22 box:tasks update 18798: {"percent":10,"message":"Cleaning up old install"}
May 08 09:48:22 box:shell reload /usr/bin/sudo -S /home/yellowtent/box/src/scripts/restartservice.sh nginx
May 08 09:48:23 box:shell reload: /usr/bin/sudo -S /home/yellowtent/box/src/scripts/restartservice.sh nginx errored BoxError: reload exited with code 1 signal null
[no timestamp]  at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:110:19)
[no timestamp]  at ChildProcess.emit (node:events:513:28)
[no timestamp]  at ChildProcess._handle.onexit (node:internal/child_process:291:12) {
[no timestamp]  reason: 'Shell Error',
[no timestamp]  details: {},
[no timestamp]  code: 1,
[no timestamp]  signal: null
[no timestamp]  }

Now that the entire Cloudron is down (I tried to reboot the entire thing from the Cloudron dashboard, thinking it might bump the nginx problem in health...oops).

No others apps seemed to be down. In fact, I had just upgraded one.

The message upon logging into the server includes:

  => There is 1 zombie process.

Expanded Security Maintenance for Applications is not enabled.

71 updates can be applied immediately.
To see these additional updates run: apt list --upgradable

1 additional security update can be applied with ESM Apps.
Learn more about enabling ESM Apps service at https://ubuntu.com/esm

I rebooted the entire VPS next while logged in via ssh, and still get no response. Checking unbound says its fine. Checking nginx though:

May 08 08:38:56 my.toutdo.com systemd[1]: Starting Unbound DNS Resolver...
May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 0: subnet
May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 1: validator
May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] notice: init module 2: iterator
May 08 08:38:57 my.toutdo.com unbound[795]: [795:0] info: start of service (unbound 1.13.1).
May 08 08:38:57 my.toutdo.com systemd[1]: Started Unbound DNS Resolver.
May 08 08:38:58 my.toutdo.com unbound[795]: [795:0] info: generate keytag query _ta-4f66. NULL IN
root@my:~# systemctl status nginx
× nginx.service - A high performance web server and a reverse proxy server
     Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/nginx.service.d
             └─cloudron.conf
     Active: failed (Result: exit-code) since Wed 2024-05-08 08:39:49 UTC; 1s ago
       Docs: man:nginx(8)
    Process: 15865 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=1/FAILURE)
        CPU: 298ms

May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
May 08 08:39:49 my.toutdo.com systemd[1]: Stopped A high performance web server and a reverse proxy server.
May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Start request repeated too quickly.
May 08 08:39:49 my.toutdo.com systemd[1]: nginx.service: Failed with result 'exit-code'.
May 08 08:39:49 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server.

When I run systemctl restart docker nginx doesn't appear as running.

And finally, running journalctl -xeu nginx.service gives me:

root@my:~# journalctl -xeu nginx.service
░░ 
░░ The unit nginx.service has entered the 'failed' state with result 'exit-code'.
May 08 08:45:45 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server.
░░ Subject: A start job for unit nginx.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A start job for unit nginx.service has finished with a failure.
░░ 
░░ The job identifier is 3976 and the job result is failed.
May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Scheduled restart job, restart counter is at 5.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ Automatic restarting of the unit nginx.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
May 08 08:45:46 my.toutdo.com systemd[1]: Stopped A high performance web server and a reverse proxy server.
░░ Subject: A stop job for unit nginx.service has finished
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A stop job for unit nginx.service has finished.
░░ 
░░ The job identifier is 4051 and the job result is done.
May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Start request repeated too quickly.
May 08 08:45:46 my.toutdo.com systemd[1]: nginx.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ The unit nginx.service has entered the 'failed' state with result 'exit-code'.
May 08 08:45:46 my.toutdo.com systemd[1]: Failed to start A high performance web server and a reverse proxy server.
░░ Subject: A start job for unit nginx.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░ 
░░ A start job for unit nginx.service has finished with a failure.
░░ 
░░ The job identifier is 4051 and the job result is failed.

Hopefully there is a fix. Thank you!

BrutalBirdie

did you check the disk space?

scooke

@BrutalBirdie Yes, checked and disk space is fine.

I did subsequently find this post - https://forum.cloudron.io/topic/6506/after-a-reboot-nginx-does-not-start/4, and found a totally separate app with a borked nginx conf, so I deleted those, and then the Cloudron started back up fine. (EDIT: turns out this app had been stopped, which is a question girish asked in https://forum.cloudron.io/post/43502.)

I was then able to restore to a Vikunja backup from a few days ago to get it up and running.

I will now try to upgrade the Vikunja app again to see what's happening. It is possible it's own upgrade failed because of nginx restarting too quickly due to the other app's borkedness.

scooke

All fixed. It appears it was the nginx conf of the stopped app which was throwing off nginx and causing an update to fail, then the restore to fail, then the entire Cloudron to fail. Following the suggestions in https://forum.cloudron.io/topic/6506/after-a-reboot-nginx-does-not-start/4 helped me get it all going again.

I'll leave this up though for reference.

nebulon

That broken nginx config issue again. After deleting one can also submit the location/sudbomain form via the app configure screen, that will generate a new nginx config for that app.

girish

You can run cloudron-support --troubleshoot and it will delete bad nginx configs .

scooke

@girish Ooh, that sounds handy! I almost wish for another broken nginx config just to try it!

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

OH OH. First Vikunja was failing with an update due to something wrong with nginx, now the entire Cloudron is down.