Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Shouldn't we get an alert when a service container fails / is unhealthy?

Shouldn't we get an alert when a service container fails / is unhealthy?

Scheduled Pinned Locked Moved Support
notificationsservices
23 Posts 5 Posters 411 Views 5 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • nebulonN Away
      nebulonN Away
      nebulon
      Staff
      wrote on last edited by
      #14

      Alright thanks for the debugging. I think the next step to test when you hit this situation, is to see if you can curl the healthcheck manually within the container, since all services within seem fine, so maybe it is a docker network issue then.

      You can find the ip using docker ps and docker inspect <rediscontainer>:

      curl -v http://<rediscontainerIP>:3000/healthcheck
      
      d19dotcaD 1 Reply Last reply
      1
      • nebulonN nebulon

        Alright thanks for the debugging. I think the next step to test when you hit this situation, is to see if you can curl the healthcheck manually within the container, since all services within seem fine, so maybe it is a docker network issue then.

        You can find the ip using docker ps and docker inspect <rediscontainer>:

        curl -v http://<rediscontainerIP>:3000/healthcheck
        
        d19dotcaD Offline
        d19dotcaD Offline
        d19dotca
        wrote last edited by
        #15

        @nebulon Okay I see the following when I run that healthcheck GET with the current yellow light causing issues again which is blocking the backup process. It does indeed seem like a possible connection issue:

        root@my:~# curl -v http://172.18.0.4:3000/healthcheck
        *   Trying 172.18.0.4:3000...
        * connect to 172.18.0.4 port 3000 from 172.18.0.1 port 44402 failed: Connection timed out
        * Failed to connect to 172.18.0.4 port 3000 after 135822 ms: Couldn't connect to server
        * Closing connection
        curl: (28) Failed to connect to 172.18.0.4 port 3000 after 135822 ms: Couldn't connect to server
        

        When I use a working redis instance showing the green lights, then I get the expected response:

        root@my:~# curl -v http://172.18.0.7:3000/healthcheck
        *   Trying 172.18.0.7:3000...
        * Connected to 172.18.0.7 (172.18.0.7) port 3000
        > GET /healthcheck HTTP/1.1
        > Host: 172.18.0.7:3000
        > User-Agent: curl/8.5.0
        > Accept: */*
        > 
        < HTTP/1.1 401 Unauthorized
        < X-Powered-By: Express
        < Content-Type: application/json; charset=utf-8
        < Content-Length: 2
        < ETag: W/"2-vyGp6PvFo4RvsFtPoIWeCReyIC8"
        < Date: Thu, 17 Apr 2025 18:43:37 GMT
        < Connection: keep-alive
        < Keep-Alive: timeout=5
        < 
        * Connection #0 to host 172.18.0.7 left intact
        

        Any recommendations? I've never had the frequency of these issues before until the past month or two, and not sure why this is happening. Unclear if this is something in my environment (it seems like maybe others are not seeing this issue), or if it's a Cloudron thing. Certainly open to this maybe being an issue in my environment only, just not sure what to try next.

        --
        Dustin Dauncey
        www.d19.ca

        1 Reply Last reply
        0
        • d19dotcaD Offline
          d19dotcaD Offline
          d19dotca
          wrote last edited by
          #16

          @nebulon / @girish , is there a possibility that this is related to https://forum.cloudron.io/post/103522 at all? I wouldn't think so since this seems to be intermittent and only affecting one container at a time (usually redis it seems)... but wanted to make sure you were aware just in case this is the root cause.

          --
          Dustin Dauncey
          www.d19.ca

          1 Reply Last reply
          0
          • d19dotcaD Offline
            d19dotcaD Offline
            d19dotca
            wrote last edited by
            #17

            This seems like a port binding issue or Docker network issue somehow, because the health check GET requests work successfully for any working containers, but the non-working containers timeout. However when inside the non-working container, the health check (if using localhost) works fine, so redis is working and everything seems to be running successfully, the issue appears to be more that it can't be accessed for some reason from the host.

            --
            Dustin Dauncey
            www.d19.ca

            1 Reply Last reply
            0
            • d19dotcaD Offline
              d19dotcaD Offline
              d19dotca
              wrote last edited by
              #18

              When I exec into the working container and run the health-check to the broken container, it succeeds. So this definitely confirms to me that the issue is somewhere from the host network to container, where-as container-to-container works perfectly fine.

              From the working container to the non-working container:

              root@redis-2e36b3e8-22d9-477c-bd56-d2c284909932:/app/code# curl -v http://172.18.0.2:3000/healthcheck
              *   Trying 172.18.0.2:3000...
              * Connected to 172.18.0.2 (172.18.0.2) port 3000
              > GET /healthcheck HTTP/1.1
              > Host: 172.18.0.2:3000
              > User-Agent: curl/8.5.0
              > Accept: */*
              > 
              < HTTP/1.1 401 Unauthorized
              < X-Powered-By: Express
              < Content-Type: application/json; charset=utf-8
              < Content-Length: 2
              < ETag: W/"2-vyGp6PvFo4RvsFtPoIWeCReyIC8"
              < Date: Fri, 18 Apr 2025 05:24:05 GMT
              < Connection: keep-alive
              < Keep-Alive: timeout=5
              < 
              * Connection #0 to host 172.18.0.2 left intact
              

              From the non-working container to the working container:

              root@redis-00895422-a1ff-4196-8bb8-cb4ff8d6eeaa:/app/code# curl -v http://172.18.0.3:3000/healthcheck
              *   Trying 172.18.0.3:3000...
              * Connected to 172.18.0.3 (172.18.0.3) port 3000
              > GET /healthcheck HTTP/1.1
              > Host: 172.18.0.3:3000
              > User-Agent: curl/8.5.0
              > Accept: */*
              > 
              < HTTP/1.1 401 Unauthorized
              < X-Powered-By: Express
              < Content-Type: application/json; charset=utf-8
              < Content-Length: 2
              < ETag: W/"2-vyGp6PvFo4RvsFtPoIWeCReyIC8"
              < Date: Fri, 18 Apr 2025 05:25:40 GMT
              < Connection: keep-alive
              < Keep-Alive: timeout=5
              < 
              * Connection #0 to host 172.18.0.3 left intact
              

              But from host to working redis and non-working redis shows results above at https://forum.cloudron.io/post/105899

              --
              Dustin Dauncey
              www.d19.ca

              1 Reply Last reply
              0
              • d19dotcaD Offline
                d19dotcaD Offline
                d19dotca
                wrote last edited by
                #19

                OMG I think I figured it out and I'm really kicking myself. I threw a bunch of logs from Cloudron into ChatGPT for review, and it highlighted a line in the logs:

                2025-04-13T01:02:17.357Z box:shell network /usr/bin/sudo -S /home/yellowtent/box/src/scripts/setblocklist.sh

                I believe this is from the ipblocklist workflow I implemented based on https://docs.cloudron.io/guides/community/blocklist-updates/. I took a look at the IP address concatenation and sure enough it includes the two IP addresses of 172.18.0.2 and 172.18.0.4, which represent the two redis containers currently failing. I think this explains why this is happening, so it was basically the result of these IPs somehow making it onto the blocklist from firehol lists. Going to determine which list specifically added it so I can remove it from my IP blocklists I guess. As soon as I cleared the IP blocklist in Cloudron, everything worked immediately. 🙂

                Sorry for the blast of comments, but figured it may help others who run into similar issues. 🙂 I'm just glad we sorted it out.

                --
                Dustin Dauncey
                www.d19.ca

                1 Reply Last reply
                3
                • d19dotcaD Offline
                  d19dotcaD Offline
                  d19dotca
                  wrote last edited by
                  #20

                  Just FYI, the botscout_30 ipset is the one that contained the 172.18.0.2 and 172.18.0.4 addresses. So I guess the botscout_30d list cannot be fully trusted for now.

                  --
                  Dustin Dauncey
                  www.d19.ca

                  1 Reply Last reply
                  1
                  • nebulonN Away
                    nebulonN Away
                    nebulon
                    Staff
                    wrote last edited by
                    #21

                    Thanks for sharing your journey towards a luckly simple fix in the end! Good to keep the blocklist in mind when hitting such rather random issues.

                    d19dotcaD 1 Reply Last reply
                    1
                    • nebulonN nebulon

                      Thanks for sharing your journey towards a luckly simple fix in the end! Good to keep the blocklist in mind when hitting such rather random issues.

                      d19dotcaD Offline
                      d19dotcaD Offline
                      d19dotca
                      wrote last edited by
                      #22

                      @nebulon I’m wondering… since Cloudron depends entirely on Docker networking to function… is there maybe room to improve the IP blocklist checking so that it ignores any entries of the current Docker networking ranges such as 172.18.xxx.xxx addresses? It feels to me like there would never be a use-case to block those, and while we certainly need to use reliable IP lists (lesson learned, haha), I also wonder if this feature should be improved in the future to ignore any private IPs or at least any Docker IPs.

                      --
                      Dustin Dauncey
                      www.d19.ca

                      girishG 1 Reply Last reply
                      2
                      • d19dotcaD d19dotca

                        @nebulon I’m wondering… since Cloudron depends entirely on Docker networking to function… is there maybe room to improve the IP blocklist checking so that it ignores any entries of the current Docker networking ranges such as 172.18.xxx.xxx addresses? It feels to me like there would never be a use-case to block those, and while we certainly need to use reliable IP lists (lesson learned, haha), I also wonder if this feature should be improved in the future to ignore any private IPs or at least any Docker IPs.

                        girishG Offline
                        girishG Offline
                        girish
                        Staff
                        wrote last edited by
                        #23

                        @d19dotca said in Shouldn't we get an alert when a service container fails / is unhealthy?:

                        checking so that it ignores any entries of the current Docker networking ranges such as 172.18.xxx.xxx addresses

                        For the moment, I have added validation to not accept such addresses . Ignoring this properly is more complicated (since we have to filter it at apply time) but atleast it won't let you save the blocklist easily .

                        1 Reply Last reply
                        1
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • Bookmarks
                          • Search