Ubuntu Reboot to install updates left some Apps not responding
-
@marcusquinn I can report a bit different behaviour but overlaps with the reboot for security updates part. Usually when they are present I reboot using Cloudron's UI. I did that this time too, however after about 15 minutes (it normally is back online in under 4 minutes), it was still unresponsive and showing the "offline" red banner it does when a connection drops. So I went into the VPS provider UI and rebooted the VPS itself (a sort of hard reboot), and this resolved it for me, it was back online in about 3 minutes after that. This I think was the first time that's happened to me where it doesn't come back online after a reboot through the Cloudron UI.
-
@d19dotca So, some of the app don't start up? What apps are these and what do you see in the app logs for these apps? I am wondering if there is some timing issue wrt to the ordering of how databases start up. Also, are all the services up in Services?
-
@girish Not quite, that's how it was described by @marcusquinn but that's not the exact behaviour I saw. It seems both myself and @imc67 had issues (initially anyways, I can confirm this second round behaved the same too, will be interesting to see if @imc67 sees that too the second time around) where the Cloudron never comes back up at all. No dashboard, can't SSH, nothing. It's as if the server was shutdown instead of rebooted. When I reboot from the VPS provider dashboard, it then comes back up right away.
-
If the server provider of any of your guys provides graphs/stats, is there a 100% CPU use around the time Cloudron is not responding. We have had a few complaints in the past about Cloudron going unresponsive randomly but we never got to the bottom of it
-
@girish I discovered the following:
I have 3 Cloudron Pro's:
2 at NetCup (6 and 10 CPU's) and
1 at DO (2 CPU's),the ones at NetCup go well without issues, the one at DO has the issue of unresponsive after GUI reboot and needed a VPS reboot.
I expect a security reboot is coming (I see there are 9 security updates waiting and @d19dotca seemed to have it already), I'll make screenshots of the DO graphics around reboot.
Regards,
Marcel.
-
@imc67 here are the results:
Cloudron 1 on Netcup: no issues
Cloudron 2 on Netcup: only Bitwarden didn't start, on the dashboard it said "running" but it was not!
Cloudron 3 on DO: this time no issues (but time before Cloudron didn't start), below the DO stats (reboot around 22:45h): -
@girish - So interestingly enough - and this may mean the issue I had is different than the others so I can open a different thread if you prefer - I had to reboot the server today but not for security updates, nothing Cloudron needs to do. but when I used the Cloudron Reboot button on System page, it behaved the same way, it's as if it does a shutdown and not a restart for me for some reason. Waited about 5 minutes (most I could wait) for it to come back but never came back and any SSH attempts were met with Connection Refused, yet within maybe just 15 seconds of rebooting the server through the VPS console instead, I could connect to SSH again and everything came back up. Something broke it seems with reboots in 5.5, approximately, at least for my node. Not sure if this is the same as others though, seems perhaps my environment is behaving a bit differently.
-
I wonder if this is because we do a "sync" before the reboot (https://git.cloudron.io/cloudron/box/-/blob/master/src/scripts/reboot.sh#L16). Maybe the sync is making it appear as if things are "stuck". This code hasn't changed in ages though. Did you happen to attach or change the disk type around the 5.5 time frame?
-
@girish No new disks or anything, no. I only have one ext4 disk attached and that's been there for a few months now, it's sole use is for Cloudron backups. No app storage or anything like that on that ext4 disk. All I can really say is the very first time I noticed it was at the time this thread was posted (so about 16-17 days ago - which I realize wasn't really 5.5 so I guess not a 5.5 issue), as I had seen after that round of Ubtuntu updates needed where Cloudron alerted me to needing a reboot, then that was the first reboot which didn't go smoothly for me and looked like the whole server was completely unavailable as SSH sessions and everything breaks, sites won't load, but it never comes back until I do a more forceful one from the VPS provider console.
-
Just to add, my Cloudron backups aren't working, so for good measure I just did this:
-
In my Hetzner VPS Console and clicked the Attempt Shutdown button. It worked and powered down.
-
Ran a Snapshot
-
Powered the server back on.
After this Cloudron was back up but none of my apps were responding.
After then doing a Reboot within Cloudron they all came back.
But I wonder why they didn't come back up after initially powering up the server.
-
-
Not sure how snapshots on hetzner exactly work, but it could be that the healthchecker simply was a bit confused and the apps would have become healthy after some time? How long did you wait until you decided to reboot again?
Also in such a case it might give some more insights to look at app logs and see if they are in fact running or not. -
@nebulon the server was totally powered off whilst doing the full hetzner snapshot (a full image of the server).
I'm not sure how long I waited, but longer than it normally takes for things to come back up after doing a reboot from within Cloudron.
All the apps were showing as Not responding in the Dashboard, nor where they reachable at their urls, so I'm pretty certain they weren't running!
It is even possible for them to show as Not responding in the Dashboard but for them to actually be running?
-
@jdaviescoates said in Ubuntu Reboot to install updates left some Apps not responding:
It is even possible for them to show as Not responding in the Dashboard but for them to actually be running?
Yes. The 'Not responding' is simply based on a periodic health check. This can fail because of two reasons:
a) some internal bug that is causing cron job to check health not firing. and
b) the health check route of app returns a non-2xx http code. this can happen if your app is totally "protected" like say an auth screen on the main page with a 401/403 status.
In the above two cases, the app itself is actually running and can be accessed from the browser but the dashboard will say 'not responding'. a) is a bug and b) is more of a Cloudron limitation, I think (this happens often in LAMP apps).