Cloudron instance scaling issues after a few hours / couple of days, apps responsive but showing a permanent "Starting..." status
-
@girish Thanks for this - After looking into the box.log:
- post
systemctl restart box
I do see indeedbox:apphealthmonitor app health: xx running / 0 stopped / 0 unresponsive
entry type every 10 seconds or so. - pre
systemctl restart box
(when we experience the issue) I do not see much of thebox:apphealthmonitor app health:xx
entries. Rather, I do have a few rarebox:apphealthmonitor setHealth: <<CONTAINER_UID>> (<<URL>>) waiting for 1192.461 to update health
entries
Hopefully it helps?
- post
-
@uwcrbc I think there is a bug that the apphealthmonitor is getting stuck (for some reason). We have seen this happen but very rarely and not reproducibly . If you can give us access, can you send me a mail at support@cloudron.io ? I can debug this further .
-
I think I have another instance of this bug with the health monitor.
I have installed an app that does not have health checks and it shows as Not Responding in the dashboard even though it works fine.
However, it seems that because of that app and the health monitor getting stuck, any new apps that get installed or upgraded also fail their health checks and remain in Starting... mode in the dashboard.
I have rebooted the server and all the apps come up, other than the one mentioned above, then after this updated apps show the Starting... message.
P.S.
It would be really nice to add thebutton for the 'cloudron' service, like we have for all other services.
-
My home server got into a situation like this just now. It seems it's because the eventlog got flooded with many entries.
+----------+ | count(*) | +----------+ | 563547 | +----------+
mysql> SELECT action, COUNT(*) AS count -> FROM eventlog -> GROUP BY action -> ORDER BY count DESC; +---------------------------+--------+ | action | count | +---------------------------+--------+ | app.up | 446592 | | app.down | 106588 | | backup.cleanup.finish | 3664 | | app.update.finish | 1354 | | app.update | 1354 | | backup.finish | 920 | | backup.start | 920 | | cloudron.update.finish | 847 | | cloudron.update | 833 | | cloudron.start | 110 | | dyndns.update | 78 |
Those app up/down eventlogs are out of hand!
I nuked them manually:
mysql> DELETE from eventlog WHERE action='app.up'; Query OK, 446782 rows affected (8 min 1,59 sec) mysql> DELETE from eventlog WHERE action='app.down'; Query OK, 106588 rows affected (9 min 16,27 sec)
That took a whopping 17min just to delete entries!
-
I pushed fixes for this now.
The issue is if the mail container is down or mail is not working, then all the app up/down events accumulate (it's trying to send an email when app goes up/down). Since the fix is a bit involved, a workaround might be to disable app up/down email notifications in the Notification view (I haven't tried this though, but would have fixed my problem atleast).
-
I pushed fixes for this now.
The issue is if the mail container is down or mail is not working, then all the app up/down events accumulate (it's trying to send an email when app goes up/down). Since the fix is a bit involved, a workaround might be to disable app up/down email notifications in the Notification view (I haven't tried this though, but would have fixed my problem atleast).
-
I think I have another instance of this bug with the health monitor.
I have installed an app that does not have health checks and it shows as Not Responding in the dashboard even though it works fine.
However, it seems that because of that app and the health monitor getting stuck, any new apps that get installed or upgraded also fail their health checks and remain in Starting... mode in the dashboard.
I have rebooted the server and all the apps come up, other than the one mentioned above, then after this updated apps show the Starting... message.
P.S.
It would be really nice to add thebutton for the 'cloudron' service, like we have for all other services.
@robi said in Cloudron instance scaling issues after a few hours / couple of days, apps responsive but showing a permanent "Starting..." status:
I have installed an app that does not have health checks and it shows as Not Responding in the dashboard even though it works fine.
Does this mean you have a custom app which does not properly to healthCheckUrl?
-
@robi said in Cloudron instance scaling issues after a few hours / couple of days, apps responsive but showing a permanent "Starting..." status:
I have installed an app that does not have health checks and it shows as Not Responding in the dashboard even though it works fine.
Does this mean you have a custom app which does not properly to healthCheckUrl?