6.3.3 a few quirks

girish

@luckow the second line can be ignored, it's not related to the first. the stop button does not work immediately after an update because the update is sort of still "happening" in the background (because we want user to see the dashboard as early as possible). after some time, do you see all the configuring things moving ahead? only at that point will the stop button work. I guess "queued" is a geeky term but it means that at some point a task will be started.

luckow

@girish
7 hours ago THEADMIN - 1.1.1.1 Cloudron update to version 6.3.2 was started

luckow

Waiting for (let's say) 1 hour for updating/configuring all 39 app is ok. 7 hours later, it looks like "something is broken". This is the reason why I want to wait until tomorrow and restart the instance via shell.

girish

@luckow Most likely because the backup took a very long time. Could it be?

luckow

@girish No. As I said in the first comment: no backup task. This was my first idea from a Cloudron administrator's point of view

girish

@luckow That is strange.. How did the update succeed with no backup

luckow

@girish easy. skip backup There is a checkbox.

girish

@luckow Thanks for access to the server, I found the issue but not sure why it happens yet. The box code temporarily lost connection to the database and caused some state to be lost. Workaround: I did a "systemctl restart box" and it picked up from where it left off.

Thanks for reporting! Let's see how common this is.

2021-06-25T12:07:36.791Z box:apps BoxError: connect ETIMEDOUT
    at /home/yellowtent/box/src/settingsdb.js:26:36
    at Query.queryCallback [as _callback] (/home/yellowtent/box/src/database.js:96:42)
    at Query.Sequence.end (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Sequence.js:83:24)
    at /home/yellowtent/box/node_modules/mysql/lib/Pool.js:205:13
    at Handshake.onConnect (/home/yellowtent/box/node_modules/mysql/lib/Pool.js:58:9)
    at Handshake.<anonymous> (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:526:10)
    at Handshake._callback (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:488:16)
    at Handshake.Sequence.end (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Sequence.js:83:24)
    at Protocol.handleNetworkError (/home/yellowtent/box/node_modules/mysql/lib/protocol/Protocol.js:369:14)
    at PoolConnection.Connection._handleNetworkError (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:418:18)

robi

I have a similar issue where 55 apps were backed up, but then many had to reconfigure so that took 20m, and all of the apps redirect to the "you found a cloudron in the wild" page.

Restarted Nginx from services menu, no change.
Restarted box from CLI, no change. (It seems we could also use a restart button for the cloudron service.)
All my apps are offline.

Update: After 30m or so, apps are coming back online.

girish

The update has to do a "rebuild" of all the containers and services because of the multi-host changes. This can take a bit depending on how fast the server CPU and disk is. It's best to follow the progress under /home/yellowtent/platformdata/logs/box.log. That said 30m does seem quite excessive. Maybe the databases are taking a long time to come up? If you can see in the logs where the most time is spent, would be good to know.

d19dotca

@girish Hi Girish, I think I had mentioned this a long time ago, but for what it's worth... I think it'd be helpful to include in the release notes a bit of an "info/alert" when there is to be more downtime than usual.

For me, it's alarming and jarring to see a random lengthy update when the usual update time to Cloudron is only a minute or two. So when it unexpectedly takes 10-30+ minutes (depending on performance of the server I guess) due to the containers needing to be re-created for example, it's not the best user experience.

I tend to make "30 minute" maintenance windows to sort of cover myself (and during off-peak hours of course) so the length of downtime isn't necessarily the main issue, but for me at least I was caught by surprise at the length it was taking and started to panic a little thinking something was wrong because it's so unusual for it to take that long. Thankfully I saw the containers being recreated when I looked at the system logs so knew it was okay and just needed some time, but I think having that pre-warning would be very helpful so admins at least know to expect a lengthier update instead of the usual update time.

robi

do these "rebuild" changes need to be a blocking operation?

can it be redone in a more gentle, async way and minimize overall downtime?

this will only amplify as we add more Cloudrons into the mix.

? Offline

Having the same issue.
Screen Shot 2021-06-27 at 10.44.12 PM.png

Is this what others experienced?

Screen Shot 2021-06-27 at 10.47.16 PM.png

@girish This is my nginx logs... I don't have physical access to the server right now but I can drive over to take a look if you have ideas.

? Offline

Ok so now after a restart it looks like nginx just refuses to start...

Fuck... guess im nuking my install, restoring from wasabi, and praying to the dieties i dont believe in that it just goes back to normal.

@staff any idea what might have gone wrong? Can I set up a time with one of you to do a supervised update?

Also I guess the contents of my volumes are fucked as well...

girish

@atridad That's not good, do you think I can have access to your server? If you enable SSH access, I can look into first thing tomorrow. thanks and sorry for the issue!

? Offline

PXL_20210628_055812487~2.jpg

This seems to be the error...

? Offline

@girish Sure. I'll DM you

EDIT: That works. Only thing is I cant access the dashboard which makes it hard to enable ssh access. I can make you an account if you need? I'm going to be up for the next 2 hours so, so let me know and I can set things up.

msbt

@atridad you could run cloudron-support --enable-ssh - this should have the same effect

? Offline

@msbt said in 6.3.3 a few quirks:

cloudron-support --enable-ssh

I love you.

Seriously though thanks. I was pacing around trying to figure out how to do this lol

nebulon

Ok at least for @atridad the issue was that for some reason old dashboard nginx configs were still around and referencing old SSL certificates which were now purged from the system. This made nginx fail to startup, causing the rest of the side-effects.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

6.3.3 a few quirks