Apps won't come up after update.

mehdi

Happened to me too.

I tried restarting the services from the dashboard. Most of the services became green, but the apps still would not boot. I restarted the server from SSH, and when it rebooted the apps were OK (I think they took a longer time than usual to get back up, but still they came back up, so good enough ^^)

And no, I did not retrieve the logs at the time

murgero

@nebulon I was restarting the service dockers via the dashboard, as for logs the only thing of value (that didn't have personal info) said it could not find images for mongo, mysql, and redis - others could have been in there but that's what stuck out to me.

I'd also like to note: before manually restarting my apps, the services listed above were offline and "not found" for over 5 hours.

girish

@murgero If you can dig into /home/yellowtent/platformdata/logs/box.log and see if there are any errors (just search for 'error' or something), it would help to understand what's happening, so atleast we can avoid it in next release.

While not apparent from the changelog, the 7.1 update is quite a massive update. It upgrades docker, recreates the entire docker network (for internal ipv6), upgrades all database container to support http and new backup/restore streaming mechanism, and also updates all app containers. This is also why after the update, the apps say 'Configuring...'. At this point, it's clear to me that this message is not helpful for end users. So, we will atleast fix the way the UI looks when the app containers are getting re-created.

Mastadamus

@girish I got home and ran systemctl restart box from SSH and it brought up all services and then the apps configured correctly.

gml

@girish To me it looks like mysql is taking too long to start and box is giving up eventually. I had to wait a long time until mysql was ready and listening. That was the moment I kicked the restart of the box service, which got stuff running (still slowly, but moving forward). My apps are now configuring / starting, so it looks fine. Maybe that information helps a bit. Here some box.log output:

Mar 17 21:56:51 box:services Waiting for mysql
Mar 17 21:56:51 box:services Attempt 1 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:57:06 box:services Attempt 2 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:57:21 box:services Attempt 3 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:57:36 box:services Attempt 4 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:57:51 box:services Attempt 5 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:58:06 box:services Attempt 6 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:58:21 box:services Attempt 7 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:58:36 box:services Attempt 8 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:58:51 box:services Attempt 9 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:58:53 box:shell statusNginx exec: systemctl is-active nginx
Mar 17 21:58:53 box:shell statusUnbound exec: systemctl is-active unbound
Mar 17 21:58:53 box:shell statusNginx (stdout): active
Mar 17 21:58:53 box:shell statusNginx (stderr): null
Mar 17 21:58:53 box:shell statusUnbound (stdout): active
Mar 17 21:58:53 box:shell statusUnbound (stderr): null
Mar 17 21:59:06 box:platform Failed to start services. retry=false (attempt 0): Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 21:59:06 box:cloudron Startup task at index 3 failed: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000
Mar 17 22:01:14 box:locker Lock unreleased platform_start

micmc

@girish said in Apps won't come up after update.:

@Mastadamus Can you look into /home/yellowtent/platformdata/logs/box.log for any errors? You can open a new window and just do systemctl restart box . You can make out there where it is erroring as it starts up the services on by one.

@girish For me, on my production box I've 3 LAMP stacks, one's restarted and is running correctly, but 2 others are refusing to restart, and quite frankly I've hard time to determine what can cause this at this point. I've examined and compared the logs of the 3 stacks when stopping and then starting again from the console, and oddly enough their log lines are identical, the one that works and the two that are not responding. So from there I cannot really see what prevents the two of them from responding.

Now, checking the box log while restarting the box from SSH, as you suggest, I can see the box:scheduler createJobs: recreating the containers for domains for 32 apps, but two are dead and keep: "waiting for xxxx.xxx to update health" and it repeats on, except the waiting for xxxx.xxx number keeps changing. It's the 2 apps that dot respond from the console.

Log sample of when I do "systemctl restart box' all is fine except these culprits:

root@Cloudron-1 ~ # tail /home/yellowtent/platformdata/logs/box.log
2022-03-18T00:44:01.790Z box:scheduler sync: adding jobs of 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1)
2022-03-18T00:44:01.971Z box:scheduler createJobs: crontab.0 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.0
2022-03-18T00:44:02.139Z box:scheduler createJobs: crontab.1 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.1
2022-03-18T00:44:02.296Z box:scheduler createJobs: crontab.2 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.2
2022-03-18T00:44:02.454Z box:scheduler createJobs: crontab.3 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.3
2022-03-18T00:44:02.655Z box:scheduler createJobs: crontab.4 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.4
2022-03-18T00:44:02.817Z box:scheduler createJobs: crontab.5 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.5
2022-03-18T00:44:02.992Z box:scheduler createJobs: crontab.6 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.6


2022-03-18T00:44:13.113Z box:apphealthmonitor setHealth: 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1) waiting for 1186.893 to update health
2022-03-18T00:44:13.114Z box:apphealthmonitor setHealth: 202446ed-20d5-4956-b800-ecf44a417991 (LAMPNotResponding-2) waiting for 1186.892 to update health

2022-03-18T00:44:23.096Z box:apphealthmonitor setHealth: 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1) waiting for 1176.91 to update health
2022-03-18T00:44:23.096Z box:apphealthmonitor setHealth: 202446ed-20d5-4956-b800-ecf44a417991 (LAMPNotResponding-2) waiting for 1176.91 to update health

The apps installed on these LAMP stacks are up to date, however I don't see why the first one has 6 different cronjob files.
Thanks.

nebulon

@micmc the scheduler does not run those tasks, as the apps which they belong to are not in healthy condition. Do you see those apps being marked as "not responding" in your dashboard? If so, can you take at those app logs, and make sure they respond with a 200 status code to /.

micmc

@nebulon said in Apps won't come up after update.:

@micmc the scheduler does not run those tasks, as the apps which they belong to are not in healthy condition. Do you see those apps being marked as "not responding" in your dashboard? If so, can you take at those app logs, and make sure they respond with a 200 status code to /.

Yes, they're marked as "not responding", two of them.

As I explained, I've 3 LAMP stacks but only 2 are "not responding" I'd took a look at first at the app's log for the 3 of them to compare but now I've downloaded the full log for one that does not respond.

It was responding with status 200 until Marsh 14 then this suddenly appears some weird stuff in apptask.log:


2022-03-14T06:02:00.000Z 172.18.0.1 - - [14/Mar/2022:06:02:00 +0000] "GET / HTTP/1.1" 200 2736 "-" "Mozilla (CloudronHealth)"

2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,856 WARN received SIGTERM indicating exit request
2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,856 INFO waiting for cron, apache2 to die
2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,860 INFO stopped: apache2 (terminated by SIGTERM)
2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,862 INFO stopped: cron (terminated by SIGTERM)
2022-03-14T06:02:28.000Z ==> Do not override existing index file
2022-03-14T06:02:29.000Z ==> Running custom startup script
2022-03-14T06:02:29.000Z ==> Imported crontab
2022-03-14T06:02:29.000Z ==> Creating credentials.txt
2022-03-14T06:02:35.000Z ==> Starting Lamp stack
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,128 CRIT Supervisor running as root (no user in config file)
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,129 INFO Included extra file "/etc/supervisor/conf.d/apache2.conf" during parsing
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,130 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,161 INFO RPC interface 'supervisor' initialized
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,161 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,163 INFO supervisord started with pid 1
2022-03-14T06:02:37.000Z 2022-03-14 06:02:37,166 INFO spawned: 'cron' with pid 20
2022-03-14T06:02:37.000Z 2022-03-14 06:02:37,168 INFO spawned: 'apache2' with pid 21
2022-03-14T06:02:38.000Z AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.xx.xx.153. Set the 'ServerName' directive globally to suppress this message
2022-03-14T06:02:39.000Z 2022-03-14 06:02:39,097 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-03-14T06:02:39.000Z 2022-03-14 06:02:39,097 INFO success: apache2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-03-14T06:02:39.000Z AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.xx.xx.153. Set the 'ServerName' directive globally to suppress this message
2022-03-14T06:02:39.000Z [Mon Mar 14 06:02:39.354214 2022] [mpm_prefork:notice] [pid 22] AH00163: Apache/2.4.29 (Ubuntu) mod_perl/2.0.10 Perl/v5.26.1 configured -- resuming normal operations
2022-03-14T06:02:39.000Z [Mon Mar 14 06:02:39.354266 2022] [core:notice] [pid 22] AH00094: Command line: '/usr/sbin/apache2 -D FOREGROUND'
2022-03-14T06:02:55.000Z ==> Do not override existing index file

The above sequence repeats itself several times then suddenly appears the following only repeating itself for thousands of lines:

2022-03-14T06:13:11.000Z ==> Do not override existing index file
2022-03-14T06:13:11.000Z ==> Running custom startup script
2022-03-14T06:13:11.000Z crontabs: No such file or directory
2022-03-14T06:13:11.000Z crontabs: created
2022-03-14T06:13:11.000Z crontabs: chowned
2022-03-14T06:13:12.000Z ==> Imported crontab
2022-03-14T06:13:12.000Z ==> Creating credentials.txt
2022-03-14T06:13:12.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable
2022-03-14T06:13:11.000Z ==> Do not override existing index file
2022-03-14T06:13:11.000Z ==> Running custom startup script
2022-03-14T06:13:11.000Z crontabs: No such file or directory
2022-03-14T06:13:11.000Z crontabs: created
2022-03-14T06:13:11.000Z crontabs: chowned
2022-03-14T06:13:12.000Z ==> Imported crontab
2022-03-14T06:13:12.000Z ==> Creating credentials.txt
2022-03-14T06:13:12.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable

2022-03-18T16:50:44.000Z ==> Do not override existing index file
2022-03-18T16:50:44.000Z ==> Running custom startup script
2022-03-18T16:50:44.000Z ==> Imported crontab
2022-03-18T16:50:44.000Z ==> Creating credentials.txt
2022-03-18T16:50:44.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable

Then in app.log, we see the 'supervisor' stuff yet again to no end:

2022-03-17T22:44:49.000Z Starting supervisor
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,326 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,329 INFO Included extra file "/etc/supervisor/conf.d/redis-service.conf" during parsing
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,329 INFO Included extra file "/etc/supervisor/conf.d/redis.conf" during parsing
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 INFO RPC interface 'supervisor' initialized
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 CRIT Server 'inet_http_server' running without any HTTP authentication checking
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 INFO RPC interface 'supervisor' initialized
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,347 INFO supervisord started with pid 1
2022-03-17T22:44:50.000Z 2022-03-17 22:44:50,348 INFO spawned: 'redis' with pid 14
2022-03-17T22:44:50.000Z 2022-03-17 22:44:50,350 INFO spawned: 'redis-service' with pid 15
2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=14, just started
2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # Configuration loaded
2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 * Running mode=standalone, port=6379.
2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 # Server initialized
2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.362 * DB loaded from disk: 0.002 seconds
2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.362 * Ready to accept connections
2022-03-17T22:44:50.000Z Redis service endpoint listening on http://:::3000
2022-03-17T22:44:51.000Z 2022-03-17 22:44:51,638 INFO success: redis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2022-03-17T22:44:51.000Z 2022-03-17 22:44:51,638 INFO success: redis-service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

Looks like redis is running and backup (for the app) as well. Not sure about the memory set to 0 warning as it's not the case in the app.

nebulon

@micmc ah so you hit the same issue with outdated LAMP app instances: https://forum.cloudron.io/topic/6612/lamp-unbound-variable-mysql

privsec

This happened to me to, and I had to do a full reboot to get the apps back up.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Apps won't come up after update.