Notifications not showing for backup failures with no disk space left

d19dotca

It seems that ever since 7.2, whenever a backup fails because of lack of disk space, there's no notification in the dashboard. Was that change by design or is that a defect in 7.2.5? If by design, how many failures would it take to then show a notification?

FWIW, my two cents... notifications should be done in the Dashboard for things which usually would require a manual intervention such as disk space issues, but could reasonably be delayed in showing for things that are often just intermittent like connection delays / timeouts, etc.

If this is by design, is it possible to modify that behaviour at all? And if a defect, is there anything I can do to help?

Here's the logs of the latest failure in my backup, but there's no actual notification present in my Dashboard:

Jul 13 07:16:01 box:tasks update 15964: {"percent":98.5,"message":"Copying /mnt/cloudron-backups/snapshot/box to /mnt/cloudron-backups/2022-07-13-140001-726/box_v7.2.5"}
Jul 13 07:16:01 box:shell copy spawn: /bin/cp -al /mnt/cloudron-backups/snapshot/box /mnt/cloudron-backups/2022-07-13-140001-726/box_v7.2.5
Jul 13 07:16:01 box:shell copy (stdout): /bin/cp: cannot create directory '/mnt/cloudron-backups/2022-07-13-140001-726/box_v7.2.5': No space left on device
Jul 13 07:16:01 box:shell copy code: 1, signal: null
Jul 13 07:16:01 box:backuptask copy: copied successfully to 2022-07-13-140001-726/box_v7.2.5. Took 0.012 seconds
Jul 13 07:16:01 box:taskworker Task took 960.252 seconds
Jul 13 07:16:01 box:tasks setCompleted - 15964: {"result":"box_box_v7.2.5_fb71c1c5c67946490199748613b423eca452263507d79d4e049de5440b1d86ef","error":null}
Jul 13 07:16:01 box:tasks update 15964: {"percent":100,"result":"box_box_v7.2.5_fb71c1c5c67946490199748613b423eca452263507d79d4e049de5440b1d86ef","error":null}

By the way... I noticed that it ends with "error: null" so maybe that's why it's not triggering a failure? But earlier in the logs it shows No space left on device and a copy code of 1, so I presume it failed... right?

girish

@d19dotca We wait for 3 backup failures before raising the notification. This change was made because previously the complaint was that we should not raise a notification immediately just because one backup failed (since network, disk etc can all fail in various ways intermittently).

What do you think can be made configurable here? I think spotting specific errors like "no disk space" is not easy since it involves grepping the output of various tools.

d19dotca

@girish That’s fair, I understand it may not be an easy fix. In an ideal world (and why I’m raising it), errors such as no disk space left would trigger an immediate alert because there basically has to be a manual intervention to fix as opposed to just transient network errors for example.

My fear here is for people who only backup once a day or once a week for example, then the current logic would dictate that the admin would be without backups for 3 days or even 3 weeks before being notified, depending on their backup schedule. I think that’s where the 3x rule currently falls apart.

Some possible solutions / improvements:

Maybe it’s possible to trigger an alert based on timing… for example if it’s been 24 hours since the first failure and there’s been no successful backup since then… then throw the alert.
Maybe the simplest solution is to make it a 2x rule instead for now?
Or maybe we can just simply have that number be configurable? So for example we can set how many failures we are willing to accept before we are notified? Maybe that’s the better solution for now if we can’t easily decipher the type of error and make logic based off that?
Lastly maybe the logic can change based on the type of backup endpoint? For example, there will basically never be network issues when backing up to a local disk / mounted disk, it should only really fail in a scenario where the disk isn’t mounted properly or if the disk is full, both requiring manual intervention. When it’s an hosted s3 type of backup though there’d be a lot more things that can happen and most of it would be outside the control of the user so in that case makes sense not to alert so often.

Hopefully that makes sense. Let me know if I can clarify at all.

timconsidine

@d19dotca
My 2p : the discussion is valid and the points are worth considering, and I wouldn't want to detract from resolving it.

But in the interim I would recommend setting up ntfy.sh, using their free hosted service or installing my custom cloudron app with a cron job which reports on disk space.
I get a morning report on all servers similar to this :

If disk space is fast changing, adjust cron job frequency to e.g. hourly.

Cron job is just simple bash script as below.
This could be improved with a conditional IF based on parsed output of df -h command whether to send a notification according to free space remaining.
Remote backup storage can be queried using e.g. rclone size remoteserver: >> /root/ntfy-msg.txt

#!/bin/bash
echo 'KASM' > /root/ntfy-msg.txt
date >> /root/ntfy-msg.txt
if [ -f /var/run/reboot-required ]; then
 cat /var/run/reboot-required >> /root/ntfy-msg.txt
fi
df -h / >> ntfy-msg.txt
curl https://ntfy.domain.tld/kasm -T /root/ntfy-msg.txt

I like seeing the raw results each morning, so have not yet added conditional logic to the bash script.
But I do query remote storage such as Scaleway and Hetzner Storage Box for free space, and send notification based on that.

I know this is not at all a solution to the issue, but it is an immediate workaround because having a current backup is critical to system security.

d19dotca

@timconsidine That's a cool idea - I'll definitely look into that (can likely use that for more use-cases too). But yeah I'd like to see the notifications improved directly (if possible) in Cloudron.

timconsidine

@d19dotca I have updated my bash script to check for disk used status and then send a ntfy.sh message using self-hosted ntfy

Adapted the script from https://scriptcrunch.com/linux-shell-script-to-automate-disk-usage-monitoring/ which is designed for email if people prefer an email.

#!/bin/bash

VALUE=80

for line in $(df -hP | egrep '^/dev/sda2' | awk '{ print $1 "_:_" $5 }')
  do
    FILESYSTEM=$(echo "$line" | awk -F"_:_" '{ print $1 }')
    DISK_USAGE=$(echo "$line" | awk -F"_:_" '{ print $2 }' | cut -d'%' -f1 )

    if [ $DISK_USAGE -ge $VALUE ]; then
      echo 'MyDocker - DISK ALERT !!' >> /root/disk-msg.txt
      date >> /root/disk-msg.txt
      echo $FILESYSTEM " is now " $DISK_USAGE "%" >> /root/disk-msg.txt
      curl https://ntfy.domain.tld/mydocker -T /root/disk-msg.txt
      rm /root/disk-msg.txt
  fi
done

Hope it helps someone.
This scrip runs very 4 hours via cron

andreas

Personally, our requirements are very basic. We wish to have some global notification settings for warnings and alters via email. So if we're not checking our Cloudron dashboard frequently, we nevertheless miss important things to handle as admins.

Failed backups or inaccessible backup locations could be part of these notifications via email.

girish

There is a bug in the current release that the code crashes when trying to send a notification if a backup failed. This is fixed . I think in the coming releases we can explore more notification options but atleast now you should get an email.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Notifications not showing for backup failures with no disk space left