Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • Bookmarks
  • Search
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Brand Logo

Cloudron Forum

Apps - Status | Demo | Docs | Install
  1. Cloudron Forum
  2. Support
  3. Server crashes caused by stopped app's runner container stuck in restart loop

Server crashes caused by stopped app's runner container stuck in restart loop

Scheduled Pinned Locked Moved Unsolved Support
domainscron
14 Posts 4 Posters 513 Views 4 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    mendoksai
    wrote last edited by
    #5

    Quick update — I just noticed cloudron-support --troubleshoot was reporting:

    [FAIL] Database migrations are pending. Last migration in DB: /20260217120000-mailPasswords-create-table.js
    

    This migration has been pending since Feb 17 — which is exactly when the instability started. I missed this earlier. Just applied it:

    cloudron-support --apply-db-migrations
    [OK] Database migrations applied successfully
    

    I've also stopped the Mattermost container that was in a restart loop (it was failing to connect to MySQL on boot and never recovering).

    Will monitor for the next few days and report back. Fingers crossed this was the missing piece.

    J 1 Reply Last reply
    2
    • M mendoksai

      Quick update — I just noticed cloudron-support --troubleshoot was reporting:

      [FAIL] Database migrations are pending. Last migration in DB: /20260217120000-mailPasswords-create-table.js
      

      This migration has been pending since Feb 17 — which is exactly when the instability started. I missed this earlier. Just applied it:

      cloudron-support --apply-db-migrations
      [OK] Database migrations applied successfully
      

      I've also stopped the Mattermost container that was in a restart loop (it was failing to connect to MySQL on boot and never recovering).

      Will monitor for the next few days and report back. Fingers crossed this was the missing piece.

      J Offline
      J Offline
      joseph
      Staff
      wrote last edited by
      #6

      @mendoksai said:

      Quick update — I just noticed cloudron-support --troubleshoot was reporting:

      [FAIL] Database migrations are pending. Last migration in DB: /20260217120000-mailPasswords-create-table.js

      This is a bug in the tool and not a real problem. It's fixed in 9.1.5.

      1 Reply Last reply
      0
      • M Offline
        M Offline
        mendoksai
        wrote last edited by
        #7

        Happened again. Every a few days. 😕

        1 Reply Last reply
        0
        • nebulonN Offline
          nebulonN Offline
          nebulon
          Staff
          wrote last edited by
          #8

          Do you by any chance have made some custom modifications to your ubuntu system like applying apt updates or so?

          1 Reply Last reply
          0
          • M Offline
            M Offline
            mendoksai
            wrote last edited by
            #9

            Yes, I followed your upgrade docs as you suggest to upgrade due to discontinuing of the support old Ubuntu version, since then this problem happens. And it just happened again, right now. Twice in today.

            1 Reply Last reply
            1
            • M Offline
              M Offline
              mendoksai
              wrote last edited by
              #10

              @nebulon Yes, here's the full timeline of changes:

              1. Server was stable on Ubuntu 20.04 + kernel 5.4 for months
              2. Upgraded to Ubuntu 22.04 + kernel 5.15 (following Cloudron upgrade docs) — instability started
              3. Upgraded to Ubuntu 24.04 + kernel 6.8 (following Cloudron upgrade docs) — issue persists
              4. Installed fail2ban and smartmontools via apt
              5. No other custom modifications

              All upgrades were done following the official Cloudron documentation. The crashes happen on both kernel 5.15 and 6.8, so it doesn't seem kernel-specific.

              One thing that may be relevant: Docker is using cgroupfs driver with cgroup v2. The Cloudron systemd unit explicitly sets --exec-opt native.cgroupdriver=cgroupfs. Could there be a compatibility issue with Ubuntu 24.04's default cgroup v2?

              The server just crashed again twice in one hour. Happy to provide SSH access if that would help debug this. This is urgent as my mail server runs on this machine.

              1 Reply Last reply
              1
              • M Offline
                M Offline
                mendoksai
                wrote last edited by
                #11

                Update: I renewed the expired domain and the app (Lychee) is now running properly. No containers in restart loop currently. The earlier crashes today were likely caused by the runner container still being in a stale state from before the domain renewal.

                I have a cron job cleaning up zombie runners every 5 minutes, which seems to be working (log shows it removed 5 runners since setup).

                Will monitor for the next few days and report back. If it stays stable, I'll mark this as resolved.

                Thank you @girish @nebulon @joseph for your help!

                1 Reply Last reply
                2
                • M Offline
                  M Offline
                  mendoksai
                  wrote last edited by
                  #12

                  @girish @nebulon Server crashed again last night. But this time the pattern is different — no containers in restart loop, no runner issues. The cron cleanup job is working. All containers were stable (Up 11 hours) before the crash.

                  The Docker journal shows the DNS resolver dying on its own:

                  23:38 - External DNS timeouts begin (185.12.64.2)
                  23:57 - Internal Docker DNS fails (172.18.0.1:53 i/o timeout)
                  23:59 - [resolver] connect failed: dial tcp 172.18.0.1:53: i/o timeout
                  00:xx - Server becomes unresponsive
                  

                  There's also a container (different ID each time) producing "ignoring event" / "cleaning up dead shim" messages every minute — not sure if related.

                  This happens roughly at the same time every night (~23:00-00:00 UTC). All previous fixes applied (no restart loops, domain renewed, hardware clean). I'm running out of ideas on my end.

                  Would it be possible to get SSH-level support to debug this? I can provide access anytime. This is really urgent as it's been impacting my mail service daily for weeks now.

                  Thank you.

                  J 1 Reply Last reply
                  0
                  • M mendoksai

                    @girish @nebulon Server crashed again last night. But this time the pattern is different — no containers in restart loop, no runner issues. The cron cleanup job is working. All containers were stable (Up 11 hours) before the crash.

                    The Docker journal shows the DNS resolver dying on its own:

                    23:38 - External DNS timeouts begin (185.12.64.2)
                    23:57 - Internal Docker DNS fails (172.18.0.1:53 i/o timeout)
                    23:59 - [resolver] connect failed: dial tcp 172.18.0.1:53: i/o timeout
                    00:xx - Server becomes unresponsive
                    

                    There's also a container (different ID each time) producing "ignoring event" / "cleaning up dead shim" messages every minute — not sure if related.

                    This happens roughly at the same time every night (~23:00-00:00 UTC). All previous fixes applied (no restart loops, domain renewed, hardware clean). I'm running out of ideas on my end.

                    Would it be possible to get SSH-level support to debug this? I can provide access anytime. This is really urgent as it's been impacting my mail service daily for weeks now.

                    Thank you.

                    J Offline
                    J Offline
                    joseph
                    Staff
                    wrote last edited by
                    #13

                    @mendoksai yes, write to me at support@cloudron.io . I can investigate.

                    1 Reply Last reply
                    1
                    • M Offline
                      M Offline
                      mendoksai
                      wrote last edited by
                      #14

                      Server was stable for 14 days after I fixed the DNS configuration myself. The original daily crash issue was resolved.

                      This morning I received Cloudron's security reboot email. Rebooted via dashboard. Server never came back. Ping responds, SSH returns kex_exchange_identification: Connection reset by peer. Hard reset via Hetzner Robot didn't help either.

                      So now I'm locked out of my own server because of an automatic security update that I didn't ask for and had no control over. My mail server is down, again.

                      I have to ask: is anyone actually testing these updates before pushing them? Every major issue I've had in the past two months has been triggered by an automatic update or upgrade. The previous instability started after a Cloudron update in February. Now this.

                      I need:

                      1. Help getting my server back online — I'll likely need to use Hetzner rescue mode
                      2. A way to permanently disable automatic security updates so I can apply them manually at a time that works for me
                      3. Some assurance that updates are being properly tested before being pushed to production servers

                      This is a production server running critical mail services. I can't keep being the QA tester for untested updates.

                      Are you guys vibe coding?

                      1 Reply Last reply
                      0

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • Bookmarks
                      • Search