8.3.0: postgres upgrade failure
-
For me apps that are using the Postgres database all ran into an error after the upgrade. The task that failed was for example
An error occurred during the operation: : Error setting up postgresql. Status code: 500 message: the database system is in recovery mode
. Some apps such as chatwoot were happy just retrying the task, while peertube still complains about the oidc plugin and nextcloud aboutoc_appconfig
missing.Still investigating on my end.
Edit: when it comes to Nextcloud it seems all tables were dropped. Unfortunately the restore of the Nextcloud app fails with
Task Error: Task 19659 crashed with code 3
. The last I see in the web log viewer is it reaching 65% of file restore. -
I could not let go if it, so I kept digging a bit. I had previously reset the memory limit back to its default, but bumped it to 6gb now for another test. This time the restore of Nextcloud went through. So now everything is operational again (and I also cleaned out some apps that had errors, but I was not using anyways anymore).
To get back to a working state I needed to restore each up using a postgres database to the state from before the upgrade to 8.3.0. Else the app would have no data in their databases (Mastodon users and toots gone, Matrix rooms gone, for example).
-
@fbartels Oof, that's not good. I think it might be easiest to start over with the postgres imports. Before the upgrade, Cloudron makes full dumps (in addition to the backups). Have you tried
systemctl restart box
already? This will retrigger an import , if the previous imports failed. -
G girish marked this topic as a question
-
Ah, it seems the restore is failing because of memory limits.
2025-03-11T09:52:46.768Z box:shell <--- Last few GCs ---> [190237:0x38f032d0] 122923 ms: Scavenge 250.0 (257.7) -> 249.5 (258.0) MB, 1.20 / 0.00 ms (average mu = 0.249, current mu = 0.130) allocation failure; [190237:0x38f032d0] 122934 ms: Scavenge 250.2 (258.0) -> 249.7 (258.2) MB, 1.16 / 0.00 ms (average mu = 0.249, current mu = 0.130) allocation failure; [190237:0x38f032d0] 123102 ms: Mark-Compact 250.3 (258.2) -> 248.4 (258.2) MB, 159.94 / 0.00 ms (average mu = 0.304, current mu = 0.366) task; scavenge might not succe ed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory ----- Native stack trace ----- 2025-03-11T09:52:46.772Z box:shell 1: 0xb8ced1 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node] 2025-03-11T09:52:46.773Z box:shell 2: 0xf06460 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 2025-03-11T09:52:46.773Z box:shell 3: 0xf06747 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 2025-03-11T09:52:46.774Z box:shell 4: 0x11182e5 [node] 2025-03-11T09:52:46.774Z box:shell 5: 0x1118874 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node] 2025-03-11T09:52:46.775Z box:shell 6: 0x112f764 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node] 2025-03-11T09:52:46.775Z box:shell 7: 0x112ff7c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFl ags) [node] 2025-03-11T09:52:46.776Z box:shell 8: 0x1106281 v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::Allocatio nOrigin, v8::internal::AllocationAlignment) [node] 2025-03-11T09:52:46.777Z box:shell 9: 0x1107415 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::Allocati onOrigin, v8::internal::AllocationAlignment) [node] 2025-03-11T09:52:46.778Z box:shell 10: 0x10e4a66 v8::internal::Factory::NewFillerObject(int, v8::internal::AllocationAlignment, v8::internal::AllocationType, v8::internal ::AllocationOrigin) [node] 2025-03-11T09:52:46.779Z box:shell 11: 0x1540896 v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node] 2025-03-11T09:52:46.779Z box:shell 12: 0x1979ef6 [node] 2025-03-11T09:52:46.926Z box:shell Finished with result: core-dump Main processes terminated with: code=dumped/status=ABRT Service runtime: 2min 3.531s CPU time consumed: 2min 51.745s 2025-03-11T09:52:46.927Z box:shell Service box-task-19660 failed to run 2025-03-11T09:52:46.932Z box:shell Service box-task-19660 finished with exit code 3 2025-03-11T09:52:46.935Z box:shell tasks: /usr/bin/sudo -S -E /home/yellowtent/box/src/scripts/starttask.sh 19660 /home/yellowtent/platformdata/logs/dd5f0f98-2b81-495d-88 28-9c967128304a/apptask.log 15 400 0 errored BoxError: tasks exited with code 3 signal null at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:137:19) at ChildProcess.emit (node:events:519:28) at ChildProcess._handle.onexit (node:internal/child_process:294:12) { reason: 'Shell Error', details: {}, code: 3, signal: null } 2025-03-11T09:52:46.935Z box:tasks startTask: 19660 completed with code 3
I will need to continue checking and maybe doing a manual postgres store later this evening.
-
@fbartels if it helps in debugging, the database import is done in the main box process . It is not done in a task (i.e a separate out of process job). Can you check if the import itself completed? If it did, it means that db upgrades completed fine. From the logs you posted, it seems some task crashed . The apps are "reconfigured" after database upgrades. I am not sure why this runs out of memory. I must be missing something
-
I could not let go if it, so I kept digging a bit. I had previously reset the memory limit back to its default, but bumped it to 6gb now for another test. This time the restore of Nextcloud went through. So now everything is operational again (and I also cleaned out some apps that had errors, but I was not using anyways anymore).
To get back to a working state I needed to restore each up using a postgres database to the state from before the upgrade to 8.3.0. Else the app would have no data in their databases (Mastodon users and toots gone, Matrix rooms gone, for example).
-
F fbartels has marked this topic as solved
-
F fbartels referenced this topic