Auto-update to 8.3 - various apps down - database issue
-
Not very happy that 8.3 rolled itself out when it doesn’t seem stable
And it had to happen when I am in the middle of projects, which means I don’t have time to research fixes.
So, seeking to leverage the ppower of the community )not being lazy of course) is there a standard fix path for issues likeError : - Error setting up postgresql. Status code: 500 message: the database system is in recovery mode
?
I have 10 apps down, some of which are core productivity apps like paperless and gitlab. -
Seems that :
- some can fixed with a simple
Retry Task
- some with
Retry Task
thenRestart
app/container - some stay as
not responding
despite those steps
Will have to investigate tomorrow
- some can fixed with a simple
-
Update : after manually doing
retry task
andrestart
on all apps with error (sometimes multiple times), and restartingpostgres
and the relevantredis
, I could resolve all except :- keycloak : a new installation so uninstalled and will re-install later
- onlyoffice : don’t use much so uninstalled and trying new installation
So going to close this
But worrying it happened.UPDATE : don’t see how to close - can @staff please do so
-
Last night my big production Cloudron (out of 3) was updated but I'd noticed that on forehand so after reading issues I increased
postgres
memory a lot and made sure all apps were updated before. Luckily it went all good but it would've been a disaster if this one had longer downtime. -
Last night my big production Cloudron (out of 3) was updated but I'd noticed that on forehand so after reading issues I increased
postgres
memory a lot and made sure all apps were updated before. Luckily it went all good but it would've been a disaster if this one had longer downtime. -
-
-
This is a rather incomprehensible blunder on the part of Cloudron! The postgres databases have all been nuked!
I'm having to manually restore almost all of the affected apps, which is a very time-intensive task, not to mention the several hours of unscheduled downtime these services have been under.
-
This is strange to read, as I updated to 8.3.0 over the weekend without any issues (other than a couple of SpamAssassin rules I needed to update because of deprecations in the latest version of SA being used in 8.30 Cloudron). Definitely no outages other than a slightly delayed startup while the databases migrated.
For what it’s worth, I had 4 GB capacity allocated to the PostgreSQL database service, so maybe that helped?
-
This is strange to read, as I updated to 8.3.0 over the weekend without any issues (other than a couple of SpamAssassin rules I needed to update because of deprecations in the latest version of SA being used in 8.30 Cloudron). Definitely no outages other than a slightly delayed startup while the databases migrated.
For what it’s worth, I had 4 GB capacity allocated to the PostgreSQL database service, so maybe that helped?
-
@d19dotca I too, have 4GB allocated to the Postgres service. Besides that, the resource graphs didn't even go anywhere near the threshold, during the update process.
@shrey Interesting, I wonder what the magic number might be for memory then if that was the root cause. It likely depends on each server's environment too, it's probably something like X times the current/average use. My PG usage is typically only running around 512 MB at most, often less than that. So 4 GB was several times more than my daily usage numbers seem to represent.
On a side note with regards to the graphs, in my experience I find the graphs to be not too valuable, I think because it (correct me if I'm wrong though) only updates the usage every 5 minutes, so if it hits its max memory usage immediately in under 5 minutes and restarts for example then it won't necessarily be recorded. I may be wrong, but just my general experience, I tend to view the graphs with a grain of salt.
-
This is a rather incomprehensible blunder on the part of Cloudron! The postgres databases have all been nuked!
I'm having to manually restore almost all of the affected apps, which is a very time-intensive task, not to mention the several hours of unscheduled downtime these services have been under.
@shrey said in Auto-update to 8.3 - various apps down - database issue:
The postgres databases have all been nuked!
This is how the upgrade is carried out. The databases are exported, a new postgres is started from fresh and then they are all reimported. During this process postgres does have unlimited memory.
However, for reasons, we are yet to figure out, on some servers, it seems the reimport fails because postgres is somehow busy. So far, we haven't logs as to why this fails. The fix is as you found out - just do the reimport by restoring the apps (the upgrade automated this but the failure makes the end user do this manually).
-
@shrey said in Auto-update to 8.3 - various apps down - database issue:
The postgres databases have all been nuked!
This is how the upgrade is carried out. The databases are exported, a new postgres is started from fresh and then they are all reimported. During this process postgres does have unlimited memory.
However, for reasons, we are yet to figure out, on some servers, it seems the reimport fails because postgres is somehow busy. So far, we haven't logs as to why this fails. The fix is as you found out - just do the reimport by restoring the apps (the upgrade automated this but the failure makes the end user do this manually).
@girish said in Auto-update to 8.3 - various apps down - database issue:
but the failure makes the end user do this manually
Well, this manual process costed me several hours of downtime of 'production' services, as well as another couple of hours for restoring them (at lot of backup files are really big, e.g. immich) 🫤
-
@girish said in Auto-update to 8.3 - various apps down - database issue:
but the failure makes the end user do this manually
Well, this manual process costed me several hours of downtime of 'production' services, as well as another couple of hours for restoring them (at lot of backup files are really big, e.g. immich) 🫤
-
SHAME ON YOU! I just woke up to a mess of cloudron servers being all messed up over this 8.3 update. To make matters worse almost every single database had to be restored. Some of this caused data loss because the apps on some accounts are used for logging 24/7. I learned a valuable lesson. Turn off auto update.
-
@girish Man this is bad
-
@girish Man this is bad
@CptPlastic if the cloudron is still in error state, would like to take a look at it. can you email us at support@cloudron.io ?
-
Not sure if strong words help the cause, it is not like we introduce bugs or slack on testing on purpose.
I wonder where the data loss comes in though, there should be only a small timeframe between app backup and app being down (so no data can get changed/added) while the app was down.