update to cloudron 5.4 crashes complete cloudron server
-
Just updated (or trying to) to 5.4 and the whole Cloudron server crashed, all apps down, dashboard down. CPU busy, no network activity.
Via chat @nebulon and @girish were not available and I coincidently had a server snapshot of 2 days old so could restore on server level.
Now I'm missing 2 days data and worse 2 days email!
How to quickly restore (because new mails are coming in) at least email from the last backup?
-
Hi,
what is the actual crash so we can fix it? Like something in the logs?
Regarding the restore, not sure I understand this correctly, so you have restored a server snapshot from 2 days ago already or plan to do so?
If the latter, please enable remote ssh support via cloudron-support --enable-ssh and send a mail to support@cloudron.io with your domain. Then we can debug and fix this.
-
@nebulon sorry but in the meanwhile I was busy for hours to get everything back online, this is what I did and found out:
-
restored server snapshot (-2 days) as that was for me at that time the only thing I could do
-
was looking for a way to restore a backup of box (because of email) while running the -2 days snapshot but in the docs there is nothing mentioned about restoring a box
-
'rented' a new NetCup VPS, started it with the Cloudron image ... but restoring a backup was not possible because that image is already on 5.4
-
I reinstalled the VPS with Ubuntu and manually installed Cloudron 5.3.4
-
then I was able to restore the backup made just before the 5.4 upgrade
Lessons learned: always make a snapshot before a Cloudron update
Question: how can one restore a box (or specifically email) on a running Cloudron? A few years ago a user (on our previous host) accidentally deleted all his folders in his emailbox, then I was able (DirectAdmin) to only restore his box (it had an hourly backup system)?
-
-
Do you happen to remember what was maybe crashing, so we can investigate what the root issue was?
Restoring the main Cloudron system database,mailboxes,... (ie the "box") is not individually possible, as apps depend on that and a rollback of that while the apps already using newer state could result in inconsistencies.
Generally I think what you did in the end was the correct approach to restore the whole server on a new VPS using the backup made prior to the update. We may have to improve that flow a bit further, since in such situations the stress level is already high, so it should be a smooth restore path.
-
@nebulon I SSH'd into the server during the 'crash' and did a 'top' command, I remember seeing only some processes using load something like 'Docker' and 'Node'.
I tried a reboot but after that I wasn't even able to SSH into the VPS anymore, that was the moment I thought of rebuilding/moving.
Herewith a screenshot from my Zabbix dashboard, backup started around 7:10h finished just after 8:05h and then the update started and "crashed".
-
Just the latest update: the fresh new VPS with Cloudron Pro restore automatically updated last night to 5.4 without any issue. Also my two other Cloudron Pro instances had an automatic update and went well.
Next time (hopefully not needed) I will download logs before a restore to make debug possible .