Apps won't come up after update.
-
Happened to me too.
I tried restarting the services from the dashboard. Most of the services became green, but the apps still would not boot. I restarted the server from SSH, and when it rebooted the apps were OK (I think they took a longer time than usual to get back up, but still they came back up, so good enough ^^)
And no, I did not retrieve the logs at the time
-
@nebulon I was restarting the service dockers via the dashboard, as for logs the only thing of value (that didn't have personal info) said it could not find images for mongo, mysql, and redis - others could have been in there but that's what stuck out to me.
I'd also like to note: before manually restarting my apps, the services listed above were offline and "not found" for over 5 hours.
-
@murgero If you can dig into
/home/yellowtent/platformdata/logs/box.log
and see if there are any errors (just search for 'error' or something), it would help to understand what's happening, so atleast we can avoid it in next release.While not apparent from the changelog, the 7.1 update is quite a massive update. It upgrades docker, recreates the entire docker network (for internal ipv6), upgrades all database container to support http and new backup/restore streaming mechanism, and also updates all app containers. This is also why after the update, the apps say 'Configuring...'. At this point, it's clear to me that this message is not helpful for end users. So, we will atleast fix the way the UI looks when the app containers are getting re-created.
-
@girish To me it looks like mysql is taking too long to start and box is giving up eventually. I had to wait a long time until mysql was ready and listening. That was the moment I kicked the restart of the box service, which got stuff running (still slowly, but moving forward). My apps are now configuring / starting, so it looks fine. Maybe that information helps a bit. Here some box.log output:
Mar 17 21:56:51 box:services Waiting for mysql Mar 17 21:56:51 box:services Attempt 1 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:57:06 box:services Attempt 2 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:57:21 box:services Attempt 3 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:57:36 box:services Attempt 4 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:57:51 box:services Attempt 5 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:58:06 box:services Attempt 6 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:58:21 box:services Attempt 7 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:58:36 box:services Attempt 8 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:58:51 box:services Attempt 9 failed. Will retry: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:58:53 box:shell statusNginx exec: systemctl is-active nginx Mar 17 21:58:53 box:shell statusUnbound exec: systemctl is-active unbound Mar 17 21:58:53 box:shell statusNginx (stdout): active Mar 17 21:58:53 box:shell statusNginx (stderr): null Mar 17 21:58:53 box:shell statusUnbound (stdout): active Mar 17 21:58:53 box:shell statusUnbound (stderr): null Mar 17 21:59:06 box:platform Failed to start services. retry=false (attempt 0): Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 21:59:06 box:cloudron Startup task at index 3 failed: Network error waiting for mysql: connect ECONNREFUSED 172.18.0.3:3000 Mar 17 22:01:14 box:locker Lock unreleased platform_start
-
@girish said in Apps won't come up after update.:
@Mastadamus Can you look into
/home/yellowtent/platformdata/logs/box.log
for any errors? You can open a new window and just dosystemctl restart box
. You can make out there where it is erroring as it starts up the services on by one.@girish For me, on my production box I've 3 LAMP stacks, one's restarted and is running correctly, but 2 others are refusing to restart, and quite frankly I've hard time to determine what can cause this at this point. I've examined and compared the logs of the 3 stacks when stopping and then starting again from the console, and oddly enough their log lines are identical, the one that works and the two that are not responding. So from there I cannot really see what prevents the two of them from responding.
Now, checking the box log while restarting the box from SSH, as you suggest, I can see the box:scheduler createJobs: recreating the containers for domains for 32 apps, but two are dead and keep: "waiting for xxxx.xxx to update health" and it repeats on, except the waiting for xxxx.xxx number keeps changing. It's the 2 apps that dot respond from the console.
Log sample of when I do "systemctl restart box' all is fine except these culprits:
root@Cloudron-1 ~ # tail /home/yellowtent/platformdata/logs/box.log 2022-03-18T00:44:01.790Z box:scheduler sync: adding jobs of 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1) 2022-03-18T00:44:01.971Z box:scheduler createJobs: crontab.0 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.0 2022-03-18T00:44:02.139Z box:scheduler createJobs: crontab.1 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.1 2022-03-18T00:44:02.296Z box:scheduler createJobs: crontab.2 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.2 2022-03-18T00:44:02.454Z box:scheduler createJobs: crontab.3 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.3 2022-03-18T00:44:02.655Z box:scheduler createJobs: crontab.4 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.4 2022-03-18T00:44:02.817Z box:scheduler createJobs: crontab.5 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.5 2022-03-18T00:44:02.992Z box:scheduler createJobs: crontab.6 (LAMPNotResponding-1) will run in container 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742-crontab.6 2022-03-18T00:44:13.113Z box:apphealthmonitor setHealth: 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1) waiting for 1186.893 to update health 2022-03-18T00:44:13.114Z box:apphealthmonitor setHealth: 202446ed-20d5-4956-b800-ecf44a417991 (LAMPNotResponding-2) waiting for 1186.892 to update health 2022-03-18T00:44:23.096Z box:apphealthmonitor setHealth: 8c6895b7-e04d-49c4-8a61-7aa5b8b1a742 (LAMPNotResponding-1) waiting for 1176.91 to update health 2022-03-18T00:44:23.096Z box:apphealthmonitor setHealth: 202446ed-20d5-4956-b800-ecf44a417991 (LAMPNotResponding-2) waiting for 1176.91 to update health
The apps installed on these LAMP stacks are up to date, however I don't see why the first one has 6 different cronjob files.
Thanks. -
@micmc the scheduler does not run those tasks, as the apps which they belong to are not in healthy condition. Do you see those apps being marked as "not responding" in your dashboard? If so, can you take at those app logs, and make sure they respond with a 200 status code to
/
. -
@nebulon said in Apps won't come up after update.:
@micmc the scheduler does not run those tasks, as the apps which they belong to are not in healthy condition. Do you see those apps being marked as "not responding" in your dashboard? If so, can you take at those app logs, and make sure they respond with a 200 status code to
/
.Yes, they're marked as "not responding", two of them.
As I explained, I've 3 LAMP stacks but only 2 are "not responding" I'd took a look at first at the app's log for the 3 of them to compare but now I've downloaded the full log for one that does not respond.
It was responding with status 200 until Marsh 14 then this suddenly appears some weird stuff in apptask.log:
2022-03-14T06:02:00.000Z 172.18.0.1 - - [14/Mar/2022:06:02:00 +0000] "GET / HTTP/1.1" 200 2736 "-" "Mozilla (CloudronHealth)" 2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,856 WARN received SIGTERM indicating exit request 2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,856 INFO waiting for cron, apache2 to die 2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,860 INFO stopped: apache2 (terminated by SIGTERM) 2022-03-14T06:02:07.000Z 2022-03-14 06:02:07,862 INFO stopped: cron (terminated by SIGTERM) 2022-03-14T06:02:28.000Z ==> Do not override existing index file 2022-03-14T06:02:29.000Z ==> Running custom startup script 2022-03-14T06:02:29.000Z ==> Imported crontab 2022-03-14T06:02:29.000Z ==> Creating credentials.txt 2022-03-14T06:02:35.000Z ==> Starting Lamp stack 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,128 CRIT Supervisor running as root (no user in config file) 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,129 INFO Included extra file "/etc/supervisor/conf.d/apache2.conf" during parsing 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,130 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,161 INFO RPC interface 'supervisor' initialized 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,161 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2022-03-14T06:02:36.000Z 2022-03-14 06:02:36,163 INFO supervisord started with pid 1 2022-03-14T06:02:37.000Z 2022-03-14 06:02:37,166 INFO spawned: 'cron' with pid 20 2022-03-14T06:02:37.000Z 2022-03-14 06:02:37,168 INFO spawned: 'apache2' with pid 21 2022-03-14T06:02:38.000Z AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.xx.xx.153. Set the 'ServerName' directive globally to suppress this message 2022-03-14T06:02:39.000Z 2022-03-14 06:02:39,097 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2022-03-14T06:02:39.000Z 2022-03-14 06:02:39,097 INFO success: apache2 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2022-03-14T06:02:39.000Z AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 172.xx.xx.153. Set the 'ServerName' directive globally to suppress this message 2022-03-14T06:02:39.000Z [Mon Mar 14 06:02:39.354214 2022] [mpm_prefork:notice] [pid 22] AH00163: Apache/2.4.29 (Ubuntu) mod_perl/2.0.10 Perl/v5.26.1 configured -- resuming normal operations 2022-03-14T06:02:39.000Z [Mon Mar 14 06:02:39.354266 2022] [core:notice] [pid 22] AH00094: Command line: '/usr/sbin/apache2 -D FOREGROUND' 2022-03-14T06:02:55.000Z ==> Do not override existing index file
The above sequence repeats itself several times then suddenly appears the following only repeating itself for thousands of lines:
2022-03-14T06:13:11.000Z ==> Do not override existing index file 2022-03-14T06:13:11.000Z ==> Running custom startup script 2022-03-14T06:13:11.000Z crontabs: No such file or directory 2022-03-14T06:13:11.000Z crontabs: created 2022-03-14T06:13:11.000Z crontabs: chowned 2022-03-14T06:13:12.000Z ==> Imported crontab 2022-03-14T06:13:12.000Z ==> Creating credentials.txt 2022-03-14T06:13:12.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable 2022-03-14T06:13:11.000Z ==> Do not override existing index file 2022-03-14T06:13:11.000Z ==> Running custom startup script 2022-03-14T06:13:11.000Z crontabs: No such file or directory 2022-03-14T06:13:11.000Z crontabs: created 2022-03-14T06:13:11.000Z crontabs: chowned 2022-03-14T06:13:12.000Z ==> Imported crontab 2022-03-14T06:13:12.000Z ==> Creating credentials.txt 2022-03-14T06:13:12.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable 2022-03-18T16:50:44.000Z ==> Do not override existing index file 2022-03-18T16:50:44.000Z ==> Running custom startup script 2022-03-18T16:50:44.000Z ==> Imported crontab 2022-03-18T16:50:44.000Z ==> Creating credentials.txt 2022-03-18T16:50:44.000Z /app/code/start.sh: line 47: MYSQL_HOST: unbound variable
Then in app.log, we see the 'supervisor' stuff yet again to no end:
2022-03-17T22:44:49.000Z Starting supervisor 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,326 CRIT Supervisor is running as root. Privileges were not dropped because no user is specified in the config file. If you intend to run as root, you can set user=root in the config file to avoid this message. 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,329 INFO Included extra file "/etc/supervisor/conf.d/redis-service.conf" during parsing 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,329 INFO Included extra file "/etc/supervisor/conf.d/redis.conf" during parsing 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 INFO RPC interface 'supervisor' initialized 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 CRIT Server 'inet_http_server' running without any HTTP authentication checking 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 INFO RPC interface 'supervisor' initialized 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,346 CRIT Server 'unix_http_server' running without any HTTP authentication checking 2022-03-17T22:44:49.000Z 2022-03-17 22:44:49,347 INFO supervisord started with pid 1 2022-03-17T22:44:50.000Z 2022-03-17 22:44:50,348 INFO spawned: 'redis' with pid 14 2022-03-17T22:44:50.000Z 2022-03-17 22:44:50,350 INFO spawned: 'redis-service' with pid 15 2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=14, just started 2022-03-17T22:44:50.000Z 14:C 17 Mar 2022 22:44:50.357 # Configuration loaded 2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 * Running mode=standalone, port=6379. 2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 # Server initialized 2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.360 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.362 * DB loaded from disk: 0.002 seconds 2022-03-17T22:44:50.000Z 14:M 17 Mar 2022 22:44:50.362 * Ready to accept connections 2022-03-17T22:44:50.000Z Redis service endpoint listening on http://:::3000 2022-03-17T22:44:51.000Z 2022-03-17 22:44:51,638 INFO success: redis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2022-03-17T22:44:51.000Z 2022-03-17 22:44:51,638 INFO success: redis-service entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Looks like redis is running and backup (for the app) as well. Not sure about the memory set to 0 warning as it's not the case in the app.