Emailing notifications of certain crucial system events, such as full disk space
-
@BrutalBirdie Can you just describe how your Zabbix system is running, I'm guessing outside of the cloudron vps right ?
@benborges
Zabbix is running on a Master Node and each Client has an Agent. (Yes the master is an external System)
Zabbix can monitor clients active and passive.
Passive means the Master asks the system for data and the system delivers.This does not always work within special networks where the master can not reach the client.
Then you use active monitoring then the client reports all data in a certain interval to the master.There can be a master / slave / proxy setup for big scale monitoring solutions. (Google Zabbix HA Cluster Setup for more details)
For more in detail please consult the doc: https://www.zabbix.com/documentation/current/en/manual/introduction/about
-
I also encountered "disk full" issue, and I was quite dumbfounded there was no email notification for this, that seems pretty basic as far as monitoring goes.
Cloudron is well-placed to add this functionality, and it would save us so much headaches.
-
I also encountered "disk full" issue, and I was quite dumbfounded there was no email notification for this, that seems pretty basic as far as monitoring goes.
Cloudron is well-placed to add this functionality, and it would save us so much headaches.
@AmbroiseUnly for some reason, linux doesn't have an event when nearing full disk space. The only way to do this then is to keep polling aggressively but this causes a lot of disk churn. Also, the notification is then limited to how frequently you can poll. There is some
quota
support but it needs also kernel support (which Cloudron cannot control). -
Would it be possible to have a guide then? Something with best-practices in mind.
Another user mentioned Zabbix, but it feels complicated to use (the doc isn't so friendly, it doesn't look simple). I don't know if that really is complex to set up, but a guide with some sort of "Cloudron recommendation" would be really nice.
Typically, something that covers how to get alerted (email) when disk reaches 50/75/90/95/99/100% capacity, and maybe also some CPU watchers. A guide covering it from "how to install it" to "how to configure it" would be really helpful.
Also, if it uses a Cloudron App, it might also be beneficial for Cloudron, because customers would reach 3 Cloudron apps quicker, meaning more sales for you.
-
You could do something like this via cron and maybe ntfy.
We had a discussion like this already, see an example here: https://forum.cloudron.io/post/72148Otherwise, googling
cron alert disk full mail
brought up e.g.
https://askubuntu.com/questions/1503361/script-to-notify-via-email-when-low-on-disk-space or https://github.com/corneliusroot/QuickStatus -
For anyone interested in configuring proper monitoring on your Cloudron server, I wrote a guide about it, and I hope you'll find it useful!
It's the kind of guide I wish I would have found when first looking at this topic.
-
I am wondering if this might be possible by now. I just got the notification "Server is running out of disk space" on the Cloudron notification tab. Since there is already the possibility to subscribe to email alerts for events like "App is down", couldn't this event be added as well?
I like the idea of Cloudron being a self-contained system, so I don't want to add a custom monitoring system to it that needs to be maintained along side it. -
@AmbroiseUnly for some reason, linux doesn't have an event when nearing full disk space. The only way to do this then is to keep polling aggressively but this causes a lot of disk churn. Also, the notification is then limited to how frequently you can poll. There is some
quota
support but it needs also kernel support (which Cloudron cannot control).@girish How about a more indirect solution?
Something that correlates to disk space, such as inodes or other low cost checks.
If not that, then how about creating a safety system for Cloudron, let's call it AirBag with ABS brakes for when you're about to crash it deploys in a controlled way.
AirBag with ABS might look like a series of 10 eager zeroed files evenly dividing a threshold of say 1GB always present on disk. When the system runs out of disk, 1 of 10 is deleted and a notification is sent. Repeat 4 more times, then wait.
That way the system has a controlled descent to 0 and some left for when an admin comes by and needs some space to work with.
Thoughts?
-
@girish How about a more indirect solution?
Something that correlates to disk space, such as inodes or other low cost checks.
If not that, then how about creating a safety system for Cloudron, let's call it AirBag with ABS brakes for when you're about to crash it deploys in a controlled way.
AirBag with ABS might look like a series of 10 eager zeroed files evenly dividing a threshold of say 1GB always present on disk. When the system runs out of disk, 1 of 10 is deleted and a notification is sent. Repeat 4 more times, then wait.
That way the system has a controlled descent to 0 and some left for when an admin comes by and needs some space to work with.
Thoughts?
-
Email notification can be added but it will be unreliable (and don't want to mislead users). See https://forum.cloudron.io/topic/7555/emailing-notifications-of-certain-crucial-system-events-such-as-full-disk-space/8
@joseph said in Emailing notifications of certain crucial system events, such as full disk space:
Email notification can be added but it will be unreliable (and don't want to mislead users). See https://forum.cloudron.io/topic/7555/emailing-notifications-of-certain-crucial-system-events-such-as-full-disk-space/8
Sure, I do understand those limitations. I was just thinking that it would be nice to have an email notification equivalent (maybe with a note pointing out the limitations) for every notification type shown in the Cloudron dashboard.
-
Currently, we run
df
every 30 mins. Maybe this is accurate enough already. In which case, what is missing is the email notification . Can add that for next release.@girish That sounds great! The last two incidents were this would have helped me were developing over several days (exploding Rocket.Chat logs and syslog.js), so this should be within the necessary precision to prevent this type of situation.