Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Feature Requests
  3. Automatically repair app when the HealthCheck goes down (Not Responding)

Automatically repair app when the HealthCheck goes down (Not Responding)

Scheduled Pinned Locked Moved Feature Requests
healthmonitoring
47 Posts 9 Posters 4.0k Views 8 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • LonkleL Offline
      LonkleL Offline
      Lonkle
      wrote on last edited by Lonkle
      #1

      I think that there should be an option for a singular "automatic repair" of an app as soon as it shows up as "Not Responding" (what do you have to lose at that point really?). I think that it should be a call to the /repair endpoint and not the /restart endpoint in your alomst-fully-documented REST API. šŸ˜‰ /repair almost never fails, and if this feature is automatic (in the background), it really doesn't matter how long it takes for an app to "restart" and /repair has a much higher likelihood of successfully doing so. So a hard reset /repair is better than a soft reset /restart if you just do it once as soon as the app goes down.

      fbartelsF 1 Reply Last reply
      2
      • LonkleL Lonkle

        I think that there should be an option for a singular "automatic repair" of an app as soon as it shows up as "Not Responding" (what do you have to lose at that point really?). I think that it should be a call to the /repair endpoint and not the /restart endpoint in your alomst-fully-documented REST API. šŸ˜‰ /repair almost never fails, and if this feature is automatic (in the background), it really doesn't matter how long it takes for an app to "restart" and /repair has a much higher likelihood of successfully doing so. So a hard reset /repair is better than a soft reset /restart if you just do it once as soon as the app goes down.

        fbartelsF Offline
        fbartelsF Offline
        fbartels
        App Dev
        wrote on last edited by fbartels
        #2

        @Lonk such a functionality would need to have some parameters it needs to work within. Like "needs to be unresponsive for x checks" and "only try restarting for y times, before giving up".

        Supervisor can actually already help for some of these cases, i think if one it it's processes fail it tries restarting it.

        LonkleL 1 Reply Last reply
        2
        • mehdiM Offline
          mehdiM Offline
          mehdi
          App Dev
          wrote on last edited by
          #3

          I do not understand what you are proposing here. Should app implement this /repair / /restart api endpoint ? If so, how are they exected to respond to it if they are already unresponsive by that point ? Or is it supposed to be on the platform ? Or do they already exist and you are propsing a change of behaviour ? I am completely lost here ^^

          LonkleL 1 Reply Last reply
          0
          • girishG Offline
            girishG Offline
            girish
            Staff
            wrote on last edited by
            #4

            I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK (https://github.com/moby/moby/pull/22719). When we started out, that feature didn't exist in docker and maybe it replaces what Cloudron does internally. Once we do that, we can get automatic restarts etc from upstream docker. Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

            LonkleL fbartelsF 4 Replies Last reply
            6
            • robiR Offline
              robiR Offline
              robi
              wrote on last edited by
              #5

              Perhaps upvote this or add additional comments there to make it happen.

              Conscious tech

              1 Reply Last reply
              2
              • fbartelsF fbartels

                @Lonk such a functionality would need to have some parameters it needs to work within. Like "needs to be unresponsive for x checks" and "only try restarting for y times, before giving up".

                Supervisor can actually already help for some of these cases, i think if one it it's processes fail it tries restarting it.

                LonkleL Offline
                LonkleL Offline
                Lonkle
                wrote on last edited by
                #6

                @fbartels said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                @Lonk such a functionality would need to have some parameters it needs to work within. Like "needs to be unresponsive for x checks" and "only try restarting for y times, before giving up".

                boxalready works in a similar fashion. It waits a good few minutes and fails like 30 Healthchecks before being labeled unresponsive which makes it a perfect opportunity to use the /repair endpoint on the app because you literally have nothing to lose, the app isn’t responding and using the /repair undocumented API endpoint has solved way more issues than a simple restart.

                This is all within the Dashboard and box system. And my ideal times to try is once. If one /repair doesn’t fix it. It’s unlikely a second will.

                1 Reply Last reply
                1
                • mehdiM mehdi

                  I do not understand what you are proposing here. Should app implement this /repair / /restart api endpoint ? If so, how are they exected to respond to it if they are already unresponsive by that point ? Or is it supposed to be on the platform ? Or do they already exist and you are propsing a change of behaviour ? I am completely lost here ^^

                  LonkleL Offline
                  LonkleL Offline
                  Lonkle
                  wrote on last edited by
                  #7

                  @mehdi said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                  I do not understand what you are proposing here. Should app implement this /repair / /restart api endpoint ? If so, how are they exected to respond to it if they are already unresponsive by that point ? Or is it supposed to be on the platform ? Or do they already exist and you are propsing a change of behaviour ? I am completely lost here ^^

                  Box / The dashboard would /repair the app. It’s an undocumented Cloudron endpoint and always fixes any issue I have with apps, unlike simply restarting them. Right now, apps could stop responding to the HEALTHCHECK and within 5 - 10 minutes, the Cloudron labels them unresponsive because they aren’t responsive.

                  I’m saying why wouldn’t the system try a /repair at that point. There’s nothing to lose and a working app to gain.

                  mehdiM 1 Reply Last reply
                  1
                  • girishG girish

                    I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK (https://github.com/moby/moby/pull/22719). When we started out, that feature didn't exist in docker and maybe it replaces what Cloudron does internally. Once we do that, we can get automatic restarts etc from upstream docker. Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                    LonkleL Offline
                    LonkleL Offline
                    Lonkle
                    wrote on last edited by Lonkle
                    #8

                    @girish said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                    I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK (https://github.com/moby/moby/pull/22719). When we started out, that feature didn't exist in docker and maybe it replaces what Cloudron does internally. Once we do that, we can get automatic restarts etc from upstream docker. Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                    Completely agree. Ideally we’d be using DOCKER HEALTHCHECKS but until then, this seems like a single line of code, if app becomes unresponsive, then /repair it. Maybe it’ll become responsive again, maybe it won’t - but at least your system tried to fix it automatically before notifying a human who has to take the /repair step manually anyway (and who knows what time it is).

                    1 Reply Last reply
                    0
                    • girishG girish

                      I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK (https://github.com/moby/moby/pull/22719). When we started out, that feature didn't exist in docker and maybe it replaces what Cloudron does internally. Once we do that, we can get automatic restarts etc from upstream docker. Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                      LonkleL Offline
                      LonkleL Offline
                      Lonkle
                      wrote on last edited by
                      #9

                      @girish said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                      Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                      You made a home grown health check before them. So you could make a home grown /repair once after ā€œUNHEALTHYā€ (or ā€œNot Respondingā€ in the Dashboard sense). What do we have to lose by doing so especially given it’s just a single endpoint already exposed in your API. This would be a quick option to add and could be an extra benefit to your home grown healthcheck even before you switch to Docker’s internal one.

                      robiR 1 Reply Last reply
                      2
                      • LonkleL Lonkle

                        @girish said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                        Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                        You made a home grown health check before them. So you could make a home grown /repair once after ā€œUNHEALTHYā€ (or ā€œNot Respondingā€ in the Dashboard sense). What do we have to lose by doing so especially given it’s just a single endpoint already exposed in your API. This would be a quick option to add and could be an extra benefit to your home grown healthcheck even before you switch to Docker’s internal one.

                        robiR Offline
                        robiR Offline
                        robi
                        wrote on last edited by
                        #10

                        @Lonk @girish I had the same thought too, if docker now reports correct status, easy to grab the status from docker ps and restart the container, like the guy does in cron.

                        Conscious tech

                        1 Reply Last reply
                        2
                        • girishG girish

                          I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK (https://github.com/moby/moby/pull/22719). When we started out, that feature didn't exist in docker and maybe it replaces what Cloudron does internally. Once we do that, we can get automatic restarts etc from upstream docker. Even though I note that https://github.com/moby/moby/issues/28400 is open for over 2 years now.

                          fbartelsF Offline
                          fbartelsF Offline
                          fbartels
                          App Dev
                          wrote on last edited by
                          #11

                          @girish said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                          I would ideally like to remove Cloudron's healthcheck field and replace it with Docker's own HEALTHCHECK

                          +1 for that

                          1 Reply Last reply
                          4
                          • LonkleL Lonkle

                            @mehdi said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                            I do not understand what you are proposing here. Should app implement this /repair / /restart api endpoint ? If so, how are they exected to respond to it if they are already unresponsive by that point ? Or is it supposed to be on the platform ? Or do they already exist and you are propsing a change of behaviour ? I am completely lost here ^^

                            Box / The dashboard would /repair the app. It’s an undocumented Cloudron endpoint and always fixes any issue I have with apps, unlike simply restarting them. Right now, apps could stop responding to the HEALTHCHECK and within 5 - 10 minutes, the Cloudron labels them unresponsive because they aren’t responsive.

                            I’m saying why wouldn’t the system try a /repair at that point. There’s nothing to lose and a working app to gain.

                            mehdiM Offline
                            mehdiM Offline
                            mehdi
                            App Dev
                            wrote on last edited by
                            #12

                            @Lonk said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                            Box / The dashboard would /repair the app. It’s an undocumented Cloudron endpoint and always fixes any issue I have with apps, unlike simply restarting them.

                            But what does this /repair do ? I am not clear on how an endpoint can magically repair an app in all case..

                            LonkleL 1 Reply Last reply
                            0
                            • mehdiM mehdi

                              @Lonk said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                              Box / The dashboard would /repair the app. It’s an undocumented Cloudron endpoint and always fixes any issue I have with apps, unlike simply restarting them.

                              But what does this /repair do ? I am not clear on how an endpoint can magically repair an app in all case..

                              LonkleL Offline
                              LonkleL Offline
                              Lonkle
                              wrote on last edited by
                              #13

                              @mehdi said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                              @Lonk said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                              Box / The dashboard would /repair the app. It’s an undocumented Cloudron endpoint and always fixes any issue I have with apps, unlike simply restarting them.

                              But what does this /repair do ? I am not clear on how an endpoint can magically repair an app in all case..

                              It destroys the container and rebuilds it (it’s undocumented but exists). If there’s something wrong with the container - this will fix it. If there’s something wrong with NGINX, this will fix it.

                              Restarting the container can help. But only 10% of the time when 90% of the time, the repair endpoint fixes a ā€œNot Respondingā€ app for me.

                              1 Reply Last reply
                              2
                              • nebulonN Offline
                                nebulonN Offline
                                nebulon
                                Staff
                                wrote on last edited by
                                #14

                                The idea for the repair is to be done consciously and also ideally be monitored by the admin. So I don't think it is good to just run it for good measure.

                                I think what would be more important to get down to why things have failed and then see how to prevent that instead. This may mean we have to add better logging or status reporting.

                                LonkleL 1 Reply Last reply
                                3
                                • nebulonN nebulon

                                  The idea for the repair is to be done consciously and also ideally be monitored by the admin. So I don't think it is good to just run it for good measure.

                                  I think what would be more important to get down to why things have failed and then see how to prevent that instead. This may mean we have to add better logging or status reporting.

                                  LonkleL Offline
                                  LonkleL Offline
                                  Lonkle
                                  wrote on last edited by Lonkle
                                  #15

                                  @nebulon said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                                  The idea for the repair is to be done consciously and also ideally be monitored by the admin. So I don't think it is good to just run it for good measure.

                                  I think what would be more important to get down to why things have failed and then see how to prevent that instead. This may mean we have to add better logging or status reporting.

                                  The reason I brought this up in the forum is because I was going to make a script that checked the health status of all my apps and /repair them when they switch to Not Responding, which is about 5 minutes after they actually go down.

                                  Then I realized, what's the healthcheck for except for to react when an app goes down. I'm not saying don't log it in notifications. If it's a one-off thing, then it didn't require human intervention to get working again (which could take awhile if you infrequently monitor the Dashboard).

                                  So, give us the info, sure, not a bad idea - but automatically trying to repair the app first and noting that it was successfully repaired after being down in notifications is a better solution. Having this be a manual step so that the admin "knows about it" is silly if it's a random one-off thing is silly and irrelevant if they get a notification about it. If the /repair endpoint works - then that's all the admin would do, review their notifications. It could be run a singular time to keep the site up and notify the admin and that sounds best IMO. Better logging, for sure, I can agree with that. But notifying + auto-repair (just one try) and then notifying if that repair was a success is better. Much better. It's the reason healthchecks exist - the native Docker now has "restart on failed healthcheck" (as does supervisor) - but you guys can take it beyond that with a /repair.

                                  I can still write my script. I can even keep it on the same Cloudron instance, monitoring the status of all the apps and do a /repair on one if it "stops responding". If it starts responding cool, I'll check the logs as soon as I can to make sure it doesn't happen again. If it doesn't start responding after that, I'll find the problem and fix it.

                                  I just don't get the stance "we could automate the first troubleshooting step that every admin will take to get their site up and running again, but we don't think it's a good idea because we want them to intentionally take that step." My question to that is, why when you could be losing on precious uptime?

                                  1 Reply Last reply
                                  1
                                  • LonkleL Offline
                                    LonkleL Offline
                                    Lonkle
                                    wrote on last edited by
                                    #16

                                    It took me awhile to find that /repair endpoint and think it's brilliant. Not even the built-in Docker auto-restart is as thorough. So, why keep a site down when it could automatically be up again as that's the point of HEALTCHECKs IMO.

                                    1 Reply Last reply
                                    0
                                    • LonkleL Offline
                                      LonkleL Offline
                                      Lonkle
                                      wrote on last edited by Lonkle
                                      #17

                                      Cloudron already restarts an app when it starts using too much memory. Cloudron could just stop the app using too much memory and have the admin notified that it stopped the app so they can start it back up manually (which, of course, sounds ridiculous - and I don't see the difference between this and that). It doesn't do that though, it restarts it, why? I'd say to keep the app running.

                                      So not at least attempting a single /repair when an app stops responding for the sake of the admin to do so manually feels needlessly manual and doesn't keep the app running when it could be (say the admin is the only admin and they're on vacation for a week, this feature could save them).

                                      1 Reply Last reply
                                      1
                                      • nebulonN Offline
                                        nebulonN Offline
                                        nebulon
                                        Staff
                                        wrote on last edited by
                                        #18

                                        I still think we should not hide issues, which may be fixed running repair. This is a bit like just having bandage over an issue lurking in the dark. Generally if repair does indeed fix the issue, then this indicates either problems in the app package, the runtime management of apps in the platform or maybe external dependencies should be handled better (like DNS setup)

                                        This is a bit like docker issues we encounter every now and then. Sometimes docker restart fixes them, but the conclusion should probably not be to restart docker every now and then if an app fails for good measure.

                                        My be just me, but I haven't experienced many systems which work well in a self-healing manner and I am afraid of hiding real bugs through this. To me this conversation should be about the issue which triggered your thinking of automatically running repair in the first place.

                                        LonkleL 1 Reply Last reply
                                        2
                                        • nebulonN nebulon

                                          I still think we should not hide issues, which may be fixed running repair. This is a bit like just having bandage over an issue lurking in the dark. Generally if repair does indeed fix the issue, then this indicates either problems in the app package, the runtime management of apps in the platform or maybe external dependencies should be handled better (like DNS setup)

                                          This is a bit like docker issues we encounter every now and then. Sometimes docker restart fixes them, but the conclusion should probably not be to restart docker every now and then if an app fails for good measure.

                                          My be just me, but I haven't experienced many systems which work well in a self-healing manner and I am afraid of hiding real bugs through this. To me this conversation should be about the issue which triggered your thinking of automatically running repair in the first place.

                                          LonkleL Offline
                                          LonkleL Offline
                                          Lonkle
                                          wrote on last edited by Lonkle
                                          #19

                                          @nebulon said in Automatically "/repair" app when the HealthCheck goes down (Not Responding):

                                          1. My goal is not to hide anything from the admin, just make the apps have higher up times by cutting out a human step. I don't see a difference of handling this the same way you run out of memory. Post a notification on the Dashboard, display the error:

                                          "App was not responding due to x error - Cloudron automatically repaired the app and it's running but please look at the logs here to see what might have happen to have caused this (and the log screen should show the last bit of the log file before the app stopped responding).

                                          1. For me personally, this has only happened once. A Wordpress blog that wasn't even in use (no plugins yet - aside from the pre-installed ones) stopped responding and a restart didn't work. A "repair" did though. I still don't know what was wrong but it never went down after that (it's been a month now). So this feature is more of a comfort for me to know that edge case would be covered when I'm on "vacation for a week."

                                          2. As a developer, I don't want "band-aids" - I want to fix the problem, but if your notification system gives me the date, time, and logs of when it had to /repair the app, then I can still do that with virtually only 5 minutes downtime of my app (instead of me, a human, getting around to checking the logs, try to figure out what happened, meanwhile I can't repair without adding more to the log, so a snapshot of the log, and a automatic singular attempt to "repair" the app seems like a better flow for user and developer alike). I, of course, don't mind an app stopping since I only use Cloudron for development - but I'm thinking more about your everyday users. If people get this proposed repair notification feature I'm asking for, then there app doesn't stay down, which is good - but it allows them to post on the forums the logs and the app (mine was Wordpress Unmanaged, and like I said, it only happened once).

                                          3. As @fbartels mentioned, surpervisor does things like this. And as @girish mentioned, the latest Docker itself now does this. So, this is definitely a valid suggestion. Why would all of these other app managers do this is if it wasn't useful? We can make it even more useful by snapshotting the logs and adding a notification saying that something may need to be fixed even though a auto-repair was successful (then click on the snapshotted logs and dig in).

                                          1 Reply Last reply
                                          3
                                          • ruihildtR Offline
                                            ruihildtR Offline
                                            ruihildt
                                            wrote on last edited by ruihildt
                                            #20

                                            Today, after the update to latest cloudron version, I had between 15-20 apps in a failed state.

                                            Clicking the repair fixed every one of them. And I had a client send me a message about downtime. I didn't look more into it, having spent at least 30 minutes going into each not responding app settings individually, clicking repair and waiting for it to successfully get back online. (And I wish there was a button to repair all apps at once^^)

                                            As much as I agree with not bandaidging issues, for my client and my reputation, I wished not responding apps would have been repaired automatically and errors to be reported.

                                            Between repairing an app and getting to run or start an asynchronous debugging on the forum, I'll always click on the repair button first.

                                            If repairing is detrimental to bug solving/reporting, I would suggest to put a place where not responding logs/errors can be retrieved in one click.

                                            LonkleL 1 Reply Last reply
                                            3
                                            Reply
                                            • Reply as topic
                                            Log in to reply
                                            • Oldest to Newest
                                            • Newest to Oldest
                                            • Most Votes


                                              • Login

                                              • Don't have an account? Register

                                              • Login or register to search.
                                              • First post
                                                Last post
                                              0
                                              • Categories
                                              • Recent
                                              • Tags
                                              • Popular
                                              • Bookmarks
                                              • Search