Nextcloud in Error state even though it's running (after Cloudron 5.5 update)

girish

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

Clone Nextcloud into another subdomain you mean? How do I do that?

EDIT: Found it.

nebulon

What was the root cause if you found it?
On a side note postgres really gets hammered with SELECTs during for example a rescan of nextcloud files.

necrevistonnezr

@nebulon said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

What was the root cause if you found it?
On a side note postgres really gets hammered with SELECTs during for example a rescan of nextcloud files.

I meant I found the cloning process, I haven't found the cause for the CPU spikes.
I'm trying go clone a backup to a new subdomain but I don't have enough free space to clone a 300 GB Nextcloud instance...

necrevistonnezr

@girish said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

@necrevistonnezr Do you think you can stop the existing nextcloud and then maybe clone from the latest backup and check if postgres is still using a lot of CPU? If it works out, maybe you can then just move stopped nextcloud into another domain and then put the cloned one there.

I did that now. Took 10 hours. Result is the same. 100 % CPU on Postgres on Nextcloud (app id 410c...). This is HUGELY frustrating. And I can't even login, it takes forever.

nebulon

For a start, do you have some nextcloud client running on your laptop or so? Maybe that fires requests like crazy and thus hammering postgres as a result?

necrevistonnezr

No, I switched off all clients on purpose - and after cloning to a new subdomain, there would be no connection, anyway.

nebulon

maybe some plugin causes this? Can you use the occ tool via terminal into the app to disable some?

necrevistonnezr

@nebulon said in Nextcloud in Error state even though it's running (after Cloudron 5.5 update):

maybe some plugin causes this? Can you use the occ tool via terminal into the app to disable some?

I think only the bare minimum is enabled....

occ app:list
Enabled:
  - accessibility: 1.5.0
  - activity: 2.12.0
  - admin_audit: 1.9.0
  - calendar: 2.0.3
  - cloud_federation_api: 1.2.0
  - comments: 1.9.0
  - contacts: 3.3.0
  - contactsinteraction: 1.0.0
  - dav: 1.15.0
  - encryption: 2.7.0
  - federatedfilesharing: 1.9.0
  - files: 1.14.0
  - files_external: 1.10.0
  - files_pdfviewer: 1.8.0
  - files_rightclick: 0.16.0
  - files_sharing: 1.11.0
  - files_trashbin: 1.9.0
  - files_versions: 1.12.0
  - files_videoplayer: 1.8.0
  - firstrunwizard: 2.8.0
  - logreader: 2.4.0
  - lookup_server_connector: 1.7.0
  - nextcloud_announcements: 1.8.0
  - notifications: 2.7.0
  - oauth2: 1.7.0
  - password_policy: 1.9.1
  - photos: 1.1.0
  - privacy: 1.3.0
  - provisioning_api: 1.9.0
  - recommendations: 0.7.0
  - serverinfo: 1.9.0
  - settings: 1.1.0
  - sharebymail: 1.9.0
  - spreed: 9.0.3
  - support: 1.2.1
  - systemtags: 1.9.0
  - text: 3.0.1
  - theming: 1.10.0
  - twofactor_backupcodes: 1.8.0
  - twofactor_totp: 4.1.3
  - updatenotification: 1.9.0
  - user_ldap: 1.9.0
  - viewer: 1.3.0
  - workflowengine: 2.1.0
Disabled:
  - bookmarks
  - bruteforcesettings
  - documentserver_community
  - federation
  - mail
  - maps
  - ransomware_detection
  - survey_client
  - tasks
  - twofactor_admin

nebulon

This is very strange, since if noone accesses nextcloud there shouldn't be long-running processes accessing the database either.

necrevistonnezr

To be clear: Postgres goes crazy if I try to login from a browser or a client...

necrevistonnezr

Can I somehow go back to an earlier Postgres version? This right now is killing my server and my workflow.

nebulon

You would have to reinstall Cloudron for that old version altogether maybe if you enable remote SSH support we could take a more direct look, if so please mail your domain to support@cloudron.io

girish

This is not related to the thread directly (but I was wondering about if we do make db rollbacks even possible). Do you use other apps that use Postgres? I realize this is not immediately obvious and hard to tell . For example, GitLab is now incompatible with the older postgres and then some of the newer apps like loomio require some of the Postgres extensions we have enabled (maybe one of these extensions is causing CPU use). If it's possible, as @nebulon said we can take a look.

girish

@necrevistonnezr Ah ok, I guess all of them are mysql. That does make it easier to debug. Please write when possible, we can look into it asap.

girish

We hit this with another user now and I think the root cause is that the migration only partially imported the database. This is causing nextcloud to do a log of queries (maybe some internal loop).

To fix this (please do all this carefully, if you not are confident just reach out to support@cloudron.io and we can do it for you):

Give the postgresql service a lot more memory (Services -> PostgreSQL). There is no good number for this, just give it as much as you can. It's harmless since you can always scale it down later after the import.
First, identify the backup of the app that was created before the Cloudron updated to 5.5.0. From this backup, copy over the postgresqldump file. Assuming f6e87030-2102-4c6c-b8eb-b2d86a268917 is the id of the nextcloud app:

# cp /home/yellowtent/appsdata/f6e87030-2102-4c6c-b8eb-b2d86a268917/postgresqldump /root/postgresqldump.current

# cp /from/the/app/backups/postgresqldump /home/yellowtent/appsdata/f6e87030-2102-4c6c-b8eb-b2d86a268917/postgresqldump

On your PC/Mac (not Cloudron!), then use the CLI tool to import the data in-place. This command simple re-imports the database that we just copied above.

$ cloudron import --in-place --app nextcloud.domain.com

If you had generated some files in the past few days, you should run the occ scan again - https://cloudron.io/documentation/apps/nextcloud/#rescan-files after nextcloud is running again.

girish

CPU usage after the re-import:

necrevistonnezr

Since I was pressured for time, I re-setup Nextcloud from scratch, imported the backup and went that route. Really stressful and I hope I don't have to do that again. Makes you realize why you pay for certain cloud services...

girish

@necrevistonnezr thanks for the update. We have fixed the code in the meantime that causes this.

necrevistonnezr

@girish Now I know why support didn't work out: Cloudron blocked my answer from my Cloudron mail account to you guys - as mail relay via Sendgrid - as spam.... (!)
FYI: the shown IP 167.89.12.138 does indeed belong to Sendgrid.

Screenshot of SendGrid.jpg

So mail relay via Sendgrid from the Cloudron mail server is not reliable, I guess....

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Nextcloud in Error state even though it's running (after Cloudron 5.5 update)