Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Discuss
  3. Essential information about failures are not forwarded to the sysadmin

Essential information about failures are not forwarded to the sysadmin

Scheduled Pinned Locked Moved Discuss
notificationsbackups
13 Posts 6 Posters 1.9k Views 6 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C Offline
      C Offline
      chymian 0
      wrote on last edited by girish
      #1

      Just saw the latest announcment and was figuring out (at the same time) that cloudron is not informing me any longer, about mission-critical errors, like when a backup fails and:

      … Backup renewal notification will only be sent if 3 consecutive backups fail. This way we allow for "external services" to fail now and then without being too aggressive about notifying user.

      Since the Users data are the most precious thing in a installation like that, it's just not acceptable, that in a worse case scenario, 3 days of these are lost!

      allow for "external services" to fail now and then.

      IMHO, that's the wrong strategy. if external services fails, the sysadmin should fix it A.S.A.P. or go for a reliable service.
      not reporting system critical failures at once compromises the security and reliability of the whole installation.

      during my research i saw a statement, "that one had to log in to oversee the system...." (and find out, whether it has failed already)
      imagine, one sysadmin is responsible for more then just one cloudron installation and cloudron is just one VM/setup of about a dozend+ or more installations of diff. apps/setups, at all kind of different places/customers and you suggest, to login to everyone to see whether it's healthy? think big, guys. that's just not doable.

      all mission-critical information has to be delivered in realtime to the responsible person/team.
      instead of turning of critical information, a way to configure different notification-channels for diff. typ/classes/importance of notifications like matrix/telegarm/mail would make much more sense - IMHO.

      nebulonN 1 Reply Last reply
      0
      • C chymian 0

        Just saw the latest announcment and was figuring out (at the same time) that cloudron is not informing me any longer, about mission-critical errors, like when a backup fails and:

        … Backup renewal notification will only be sent if 3 consecutive backups fail. This way we allow for "external services" to fail now and then without being too aggressive about notifying user.

        Since the Users data are the most precious thing in a installation like that, it's just not acceptable, that in a worse case scenario, 3 days of these are lost!

        allow for "external services" to fail now and then.

        IMHO, that's the wrong strategy. if external services fails, the sysadmin should fix it A.S.A.P. or go for a reliable service.
        not reporting system critical failures at once compromises the security and reliability of the whole installation.

        during my research i saw a statement, "that one had to log in to oversee the system...." (and find out, whether it has failed already)
        imagine, one sysadmin is responsible for more then just one cloudron installation and cloudron is just one VM/setup of about a dozend+ or more installations of diff. apps/setups, at all kind of different places/customers and you suggest, to login to everyone to see whether it's healthy? think big, guys. that's just not doable.

        all mission-critical information has to be delivered in realtime to the responsible person/team.
        instead of turning of critical information, a way to configure different notification-channels for diff. typ/classes/importance of notifications like matrix/telegarm/mail would make much more sense - IMHO.

        nebulonN Offline
        nebulonN Offline
        nebulon
        Staff
        wrote on last edited by
        #2

        @chymian-0 I can understand your point. The main reason we did this is because in most cases of such one-off failures there were intermittent issues and by the time the user investigated or contacted us, the issue was already resolved since Cloudron will retry anyways. So now a notification is only raised after consecutive failures, which is when it gets problematic. So essentially we are more relying on the retry since failures just happen and in nearly all such cases there is nothing to be done about it in hindsight, so there is really no need to invest time or energy unless it fails even after retry.

        C 1 Reply Last reply
        3
        • nebulonN nebulon

          @chymian-0 I can understand your point. The main reason we did this is because in most cases of such one-off failures there were intermittent issues and by the time the user investigated or contacted us, the issue was already resolved since Cloudron will retry anyways. So now a notification is only raised after consecutive failures, which is when it gets problematic. So essentially we are more relying on the retry since failures just happen and in nearly all such cases there is nothing to be done about it in hindsight, so there is really no need to invest time or energy unless it fails even after retry.

          C Offline
          C Offline
          chymian 0
          wrote on last edited by
          #3

          @nebulon backup failed completely yesterday without any notification. so I missed a complete day of backup without being informed about it. that's not acceptable!

          nebulonN 1 Reply Last reply
          1
          • C chymian 0

            @nebulon backup failed completely yesterday without any notification. so I missed a complete day of backup without being informed about it. that's not acceptable!

            nebulonN Offline
            nebulonN Offline
            nebulon
            Staff
            wrote on last edited by
            #4

            @chymian-0 if it failed completely even after retry, it indeed should have notified, we have to check why this didn't happen.

            C d19dotcaD 2 Replies Last reply
            0
            • nebulonN nebulon

              @chymian-0 if it failed completely even after retry, it indeed should have notified, we have to check why this didn't happen.

              C Offline
              C Offline
              chymian 0
              wrote on last edited by
              #5

              @nebulon, ok now I'm getting to understand you better.
              It should have sent a mail after complete failure of backup, not only after 3 days… - I see.

              but even in the past, when backup failed, I got only 1 single mail.
              from my understanding now, it should have sent me a mail per try - which never happened.

              thx for clarification.
              do you need any logs/info?

              d19dotcaD 1 Reply Last reply
              0
              • C chymian 0

                @nebulon, ok now I'm getting to understand you better.
                It should have sent a mail after complete failure of backup, not only after 3 days… - I see.

                but even in the past, when backup failed, I got only 1 single mail.
                from my understanding now, it should have sent me a mail per try - which never happened.

                thx for clarification.
                do you need any logs/info?

                d19dotcaD Offline
                d19dotcaD Offline
                d19dotca
                wrote on last edited by
                #6

                @chymian-0 Side note not meant to derail the great conversation (as I can certainly see both sides to the equation here)... If your backups are critical (which they are for many of us), it'll be infinitely better to backup multiple times a day, not just once. As a side benefit, you'd be notified sooner if any failures too. Personally I backup about 4 times a day (though I tweak this occasionally).

                --
                Dustin Dauncey
                www.d19.ca

                1 Reply Last reply
                0
                • nebulonN nebulon

                  @chymian-0 if it failed completely even after retry, it indeed should have notified, we have to check why this didn't happen.

                  d19dotcaD Offline
                  d19dotcaD Offline
                  d19dotca
                  wrote on last edited by d19dotca
                  #7

                  @nebulon It may be a good idea to allow customization of the notification system, so that those users who deem everything uber critical (for business reasons and such) can be notified upon every failure, and those who are more using Cloudron for their hobby can choose to not be bugged by failures as often due to the very real reason that led to this change in the first place... too many false-positives or intermittent issues that no sysadmin can necessarily do anything about anyways (i.e. if the object storage provider is having issues).

                  If you want to keep it 'simple' still, then maybe just have two options for users... aggressive or non-aggressive. lol.

                  --
                  Dustin Dauncey
                  www.d19.ca

                  1 Reply Last reply
                  1
                  • girishG Offline
                    girishG Offline
                    girish
                    Staff
                    wrote on last edited by
                    #8

                    To add to this: the backup failure notifications after three failures and not after 3 days. So, if you have backups at a higher interval, you will get notified soon enough.

                    jdaviescoatesJ 1 Reply Last reply
                    0
                    • girishG girish

                      To add to this: the backup failure notifications after three failures and not after 3 days. So, if you have backups at a higher interval, you will get notified soon enough.

                      jdaviescoatesJ Offline
                      jdaviescoatesJ Offline
                      jdaviescoates
                      wrote on last edited by jdaviescoates
                      #9

                      @girish @nebulon when a backup fails, does it retry again before the next scheduled backup is due? Or is the "retry" basically the next scheduled backup? I think the latter, but I'm not actually sure.

                      I use Cloudron with Gandi & Hetzner

                      girishG 1 Reply Last reply
                      1
                      • jdaviescoatesJ jdaviescoates

                        @girish @nebulon when a backup fails, does it retry again before the next scheduled backup is due? Or is the "retry" basically the next scheduled backup? I think the latter, but I'm not actually sure.

                        girishG Offline
                        girishG Offline
                        girish
                        Staff
                        wrote on last edited by
                        #10

                        @jdaviescoates There's two levels of retry - things like network errors, transient api errors etc are retried immediately. The other retry is in the next scheduled backup time. There is no other retry between scheduled backup time.

                        M 1 Reply Last reply
                        2
                        • girishG girish

                          @jdaviescoates There's two levels of retry - things like network errors, transient api errors etc are retried immediately. The other retry is in the next scheduled backup time. There is no other retry between scheduled backup time.

                          M Offline
                          M Offline
                          msbt
                          App Dev
                          wrote on last edited by
                          #11

                          Just chiming in here because I came back from a short trip and thought everything would be okay (since I didn't get any emails that said otherwise) but then I saw this (no, the first notification is not about the primary domain):

                          e5d6b567-a796-45bb-84e1-9fd3b3756eae-grafik.png

                          and the Backup-view
                          55a4e8c0-bc8b-4183-9e9c-fe7ae3ba4573-grafik.png

                          So instead of 7 backups (1 per day) I only got two left, the others apparently got cleaned after the backup failed.

                          Here are my questions:

                          As you can see, I'm using encrypted tgz as storage format. Wouldn't it double the required space if I added more times to the scheduler?

                          Why was there no notification if it failed so often?

                          The logs of the last crash (the app in question was the culprit for earlier crashes, a mid-size Magento store whose cache might interfere with the backup process, but I thought we fixed that along the way):

                          Sep 13 01:09:35 box:backups some-app.at Unable to backup BoxError: Backuptask crashed
                          at /home/yellowtent/box/src/backups.js:901:29
                          at f (/home/yellowtent/box/node_modules/once/once.js:25:25)
                          at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:77:9)
                          at ChildProcess.emit (events.js:315:20)
                          at Process.ChildProcess._handle.onexit (internal/child_process.js:277:12) {
                          reason: 'Internal Error',
                          details: {}
                          
                          girishG 1 Reply Last reply
                          2
                          • M msbt

                            Just chiming in here because I came back from a short trip and thought everything would be okay (since I didn't get any emails that said otherwise) but then I saw this (no, the first notification is not about the primary domain):

                            e5d6b567-a796-45bb-84e1-9fd3b3756eae-grafik.png

                            and the Backup-view
                            55a4e8c0-bc8b-4183-9e9c-fe7ae3ba4573-grafik.png

                            So instead of 7 backups (1 per day) I only got two left, the others apparently got cleaned after the backup failed.

                            Here are my questions:

                            As you can see, I'm using encrypted tgz as storage format. Wouldn't it double the required space if I added more times to the scheduler?

                            Why was there no notification if it failed so often?

                            The logs of the last crash (the app in question was the culprit for earlier crashes, a mid-size Magento store whose cache might interfere with the backup process, but I thought we fixed that along the way):

                            Sep 13 01:09:35 box:backups some-app.at Unable to backup BoxError: Backuptask crashed
                            at /home/yellowtent/box/src/backups.js:901:29
                            at f (/home/yellowtent/box/node_modules/once/once.js:25:25)
                            at ChildProcess.<anonymous> (/home/yellowtent/box/src/shell.js:77:9)
                            at ChildProcess.emit (events.js:315:20)
                            at Process.ChildProcess._handle.onexit (internal/child_process.js:277:12) {
                            reason: 'Internal Error',
                            details: {}
                            
                            girishG Offline
                            girishG Offline
                            girish
                            Staff
                            wrote on last edited by
                            #12

                            @msbt said in Essential information about failuers are not forwarded to the sysadmin:

                            Why was there no notification if it failed so often?

                            It should have tried to send an email notification. Let me log this, so in the future we can atleast identify if the email is not sent out at all or if the email failed to send or some other issue.

                            M 1 Reply Last reply
                            3
                            • girishG girish

                              @msbt said in Essential information about failuers are not forwarded to the sysadmin:

                              Why was there no notification if it failed so often?

                              It should have tried to send an email notification. Let me log this, so in the future we can atleast identify if the email is not sent out at all or if the email failed to send or some other issue.

                              M Offline
                              M Offline
                              msbt
                              App Dev
                              wrote on last edited by
                              #13

                              @girish do you need anything from me or can you replicate that by yourself?

                              1 Reply Last reply
                              1
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • Bookmarks
                                • Search