Nextcloud in Error state even though it's running (after Cloudron 5.5 update)

msbt

Experiencing an odd behaviour on one of my servers with a 700GB Nextcloud instance. The dashboard/app info says "Error : - Error restoring postgresql. Status code: 500 message: Failed to import database. Code: 3"

Restarting the app didn't change anything, stopping doesn't work because it's in an erronous state.

Error logs show this:

Aug 13 01:12:04 box:tasks setCompleted - 4453: {"result":null,"error":{"stack":"BoxError: Unknown install command in apptask:error\n at /home/yellowtent/box/src/apptask.js:1070:29\n at /home/yellowtent/box/src/apps.js:520:13\n at Query.<anonymous> (/home/yellowtent/box/src/appdb.js:147:13)\n at Query.<anonymous> (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:526:10)\n at Query._callback (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:488:16)\n at Query.Sequence.end (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Sequence.js:83:24)\n at Query._handleFinalResultPacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Query.js:149:8)\n at Query.EofPacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Query.js:133:8)\n at Protocol._parsePacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/Protocol.js:291:23)\n at Parser._parsePacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/Parser.js:433:10)","name":"BoxError","reason":"Internal Error","details":{},"message":"Unknown install command in apptask:error"}}

Aug 13 01:12:04 box:tasks 4453: {"percent":100,"result":null,"error":{"stack":"BoxError: Unknown install command in apptask:error\n at /home/yellowtent/box/src/apptask.js:1070:29\n at /home/yellowtent/box/src/apps.js:520:13\n at Query.<anonymous> (/home/yellowtent/box/src/appdb.js:147:13)\n at Query.<anonymous> (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:526:10)\n at Query._callback (/home/yellowtent/box/node_modules/mysql/lib/Connection.js:488:16)\n at Query.Sequence.end (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Sequence.js:83:24)\n at Query._handleFinalResultPacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Query.js:149:8)\n at Query.EofPacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/sequences/Query.js:133:8)\n at Protocol._parsePacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/Protocol.js:291:23)\n at Parser._parsePacket (/home/yellowtent/box/node_modules/mysql/lib/protocol/Parser.js:433:10)","name":"BoxError","reason":"Internal Error","details":{},"message":"Unknown install command in apptask:error"}}

not sure if those are related, but it is still up and running. Any suggestions on what to do?

nebulon

Did you attempt to retry the restore in the repair section of the app configure view?

msbt

Just had a quick restore session with @girish, he suggested that even though postgres had 3,5GB of RAM available, that this still wasn't enough to import/migrate a 400MB+ dump of the database. We upped the limit to 4GB and did another restore, this fixed the app status. I rescanned the files now and waiting for feedback if any other stuff is missing.

girish

What happened was that the db migration failed because postgres wanted more memory. What I did was to give it more memory and trigger a in-place import. That did the trick.

necrevistonnezr

Something is not right with my Nextcloud instance, either, after the Cloudron 5.5 update. I had to increase the memory to 8 GB and CPU to 50 %, otherwise the app was in a "not responding" state.

All clients (mac, PC, iOS) are in an endless loop to sync but never actually do. My Nextcloud website takes forever to load (all other Cloudron services like FreshRSS are fine). I re-setup the iOS client which takes forever. After entering credentials on the login dialog, I'm not being redirected to the app but I see a webview of Nextcloud.

The Nextcloud logs don't show anything odd at a first glance except this:

"Aug 18 09:23:29 [Tue Aug 18 07:23:29.826254 2020] [rewrite:error] [pid 8495] [client 172.18.0.1:50318] AH00670: Options FollowSymLinks and SymLinksIfOwnerMatch are both off, so the RewriteRule directive is also forbidden due to its similar ability to circumvent directory restrictions : /app/code/config"

and (?)

Aug 18 09:27:12 58:C 18 Aug 07:27:12.149 * DB saved on disk
Aug 18 09:27:12 58:C 18 Aug 07:27:12.159 * RDB: 0 MB of memory used by copy-on-write
Aug 18 09:27:12 15:M 18 Aug 07:27:12.220 * Background saving terminated with success

necrevistonnezr

Also: Suddenly there's a new folder "uploads" that wasn't there before and that I didn't create.

necrevistonnezr

I think the culprit is PostgreSQL 11 - was that recently changed in the Nextcloud Docker? My CPU runs at 100 % the whole time....

Anmerkung 2020-08-18 153717.png

girish

Yes, Cloudron moved to Postgres 11 in the previous release (Cloudron 5.5). Can you just try restarting Postgres under services?

Another thing is in /home/yellowtent/platformdata/logs/box.log do you see some error like Error importing postgresql ?

girish

@necrevistonnezr said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

Aug 18 09:27:12 58:C 18 Aug 07:27:12.149 * DB saved on disk
Aug 18 09:27:12 58:C 18 Aug 07:27:12.159 * RDB: 0 MB of memory used by copy-on-write
Aug 18 09:27:12 15:M 18 Aug 07:27:12.220 * Background saving terminated with success

This one is from redis, you can ignore it.

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

Yes, Cloudron moved to Postgres 11 in the previous release (Cloudron 5.5). Can you just try restarting Postgres under services?

Another thing is in /home/yellowtent/platformdata/logs/box.log do you see some error like Error importing postgresql ?

No error in box.log
After restarting Postgres, it immediateley goes back to 100 % CPU.

Anmerkung 2020-08-18 162033.png

girish

Just to narrow the issue down, if you stop the nextcloud app, does the postgresql cpu usage go back to normal? From the screenshot it seems it's busy in some SELECT command.

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

Just to narrow the issue down, if you stop the nextcloud app, does the postgresql cpu usage go back to normal? From the screenshot it seems it's busy in some SELECT command.

After stopping the app, CPU cores go down to the usual 5-15 %

Anmerkung 2020-08-18 163519.png

girish

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

Clone Nextcloud into another subdomain you mean? How do I do that?

EDIT: Found it.

nebulon

What was the root cause if you found it?
On a side note postgres really gets hammered with SELECTs during for example a rescan of nextcloud files.

necrevistonnezr

@nebulon said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

What was the root cause if you found it?
On a side note postgres really gets hammered with SELECTs during for example a rescan of nextcloud files.

I meant I found the cloning process, I haven't found the cause for the CPU spikes.
I'm trying go clone a backup to a new subdomain but I don't have enough free space to clone a 300 GB Nextcloud instance...

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

I did that now. Took 10 hours. Result is the same. 100 % CPU on Postgres on Nextcloud (app id 410c...). This is HUGELY frustrating. And I can't even login, it takes forever.

nebulon

For a start, do you have some nextcloud client running on your laptop or so? Maybe that fires requests like crazy and thus hammering postgres as a result?

necrevistonnezr

No, I switched off all clients on purpose - and after cloning to a new subdomain, there would be no connection, anyway.

nebulon

maybe some plugin causes this? Can you use the occ tool via terminal into the app to disable some?

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Nextcloud in Error state even though it's running (after Cloudron 5.5 update)