Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.


    Cloudron Forum

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular

    Solved server down: apps not restarting

    Support
    5
    15
    468
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • chymian 0
      chymian 0 last edited by

      hey,
      today I'm experiening a major downtime with nearly all cloudron apps.
      trying to restart the apps gives the following error:
      An error occurred during the restart app operation: server error: (HTTP code 500) server error - Cannot restart container 9b309fe6-a68b-47d5-86d7-2797aa67ccdd: failed to create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknown

      reboot did not change the situation.
      most apps are down.
      there was NO notification!!

      other things that do not work is bash-completion, apt, etc.
      they all use /tmp, which seems too small/full.

      from the sysadmin-site, I did not make any changes to the setup in the last week.

      CL 5.0.6 runs on a VM 8G RAM, 2 Proc
      UBU 18.04

      TIA
      guenter

      1 Reply Last reply Reply Quote 0
      • subven
        subven last edited by

        @chymian-0 said in server down: apps not restarting:

        create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknown

        Since rebooting worked, I assume you have disk space left on the system? Do you have SSH (root) access to the server? Does the cloudron web SSH console works? Seems to be an issue with cgroups to me. Does your server come with any kind of limitation?

        Please post the output of:

        • df -h
        • df -i
        • cat /proc/cgroups
        • docker info; echo; echo;
        1 Reply Last reply Reply Quote 0
        • chymian 0
          chymian 0 last edited by

          @subven,
          that's all checked. n pbls. there.
          /tmp & /dev/pts are are pseudofilesystems and are not managed via fstab.
          they are too smale.

          @girish can you pls. check on this. services are down for 24h now

          subven 1 Reply Last reply Reply Quote 0
          • necrevistonnezr
            necrevistonnezr last edited by

            Have you contacted the developers via their email (support @ cloudron)? They usually get back within hours. It think that's faster than via this discussion forum.

            1 Reply Last reply Reply Quote 2
            • scooke
              scooke last edited by scooke

              It could be the system has changed since I had a similar problem July 2018, but in my case old images had remained in /boot when they should have been deleted. I suggest not running any command that changes anything because I have no idea if the commands are still relevant in 2020. I had to shut down the cloudron

              sudo systemctl stop box
              sudo systemctl stop docker
              

              then run this command to see if they had:

              sudo dpkg --list 'linux-image*'|awk '{ if ($1=="ii") print $2}'|grep -v `uname -r`
              

              Then, when it was obvious my /boot was stuffed to the gills with prior linux-images, I had to remove old kernels, adjusting the below to the results from above:

              sudo rm -rf /boot/*-4.4.0-{98,97,96,93,62}-*
              

              Then, automatically remove unneeded kernels

              sudo purge-old-kernels
              

              After that, bring Cloudron back online:

              sudo systemctl restart box
              sudo systemctl restart docker
              sudo systemctl restart cloudron.target
              

              The fact that this occurred in /boot, and not in the main partition, had thrown us for a little while.

              A life lived in fear is a life half-lived

              1 Reply Last reply Reply Quote 0
              • scooke
                scooke last edited by

                It could also be you are storing backups locally. You can check in the backup tab yourcloudron.com/#/backups. If so, you will have to delete those somehow. The one line in your error message certainly points to the main culprit: no space left on device. You need to figure out what's using up the space.

                A life lived in fear is a life half-lived

                1 Reply Last reply Reply Quote 0
                • subven
                  subven @chymian 0 last edited by subven

                  @chymian-0 said in server down: apps not restarting:

                  /tmp & /dev/pts are are pseudofilesystems and are not managed via fstab.
                  they are too smale.

                  And

                  mkdir /tmp/pty465273103: no space left on device: unknown

                  Have you checked if /tmp is mounted correctly and is writable? It should appear in df -h even if it is a pseudo-filesystem. Since you provided no informations it's hard to help you. Please note that support time is expensive and Cloudrons support only covers problems that are directly caused by cloudron. In addition, time spent on support cannot be used for development, so it is in our best interest to help you here.

                  1 Reply Last reply Reply Quote 1
                  • chymian 0
                    chymian 0 last edited by chymian 0

                    @subven
                    yes, sure 😉
                    /tmp is not a tempfs, it's on root, and GBs free.
                    it seems to have to do with cgroups and the space within the containers.
                    when the system CTs run and one app, then its exhausted.
                    I tried an older kernel, same.
                    ??

                    thx everybody for trying to help.

                    I think thats a pure cloudron/system/cgroup pbl. as I haven't touched that system.
                    and never came around that on my various other docker projects/server.

                    1 Reply Last reply Reply Quote 0
                    • girish
                      girish Staff last edited by girish

                      @chymian-0 Sure, will be happy to take a look immediately. Are you able to run cloudron-support --enable-ssh and then send a mail to support@cloudron.io with your domain name/IP ?

                      If that command doesn't work, put our ssh keys in your /root/.ssh/authorized_keys (https://cloudron.io/documentation/support/#ssh-keys)

                      chymian 0 1 Reply Last reply Reply Quote 0
                      • chymian 0
                        chymian 0 @girish last edited by

                        @girish
                        hey, thanks for help.
                        I already did sent a mail to support with the info, a few hours ago. Didn't you receive that?
                        it's from an ...r@gmx.net address.

                        1 Reply Last reply Reply Quote 0
                        • girish
                          girish Staff last edited by

                          @chymian-0 Got it, will look into it shortly.

                          1 Reply Last reply Reply Quote 1
                          • girish
                            girish Staff last edited by

                            @chymian-0 From what I can tell tell, there is inode exhaustion in the rootfs. If you do, df -i it tells you that you have run out of inodes. I think this is because this is run on top of btrfs. btrfs is notorious for this. We used to use btrfs on Cloudron 2-3 years ago and gave up because it's just some issue or the other like this. You can to do btrfs balance from outside the cloudron to free up some space, but I am not a btrfs expert.

                            1 Reply Last reply Reply Quote 1
                            • girish
                              girish Staff last edited by girish

                              @chymian-0 The easiest fix is to just give the rootfs more space. Is this possible?

                              Here's some discussion about it - https://lwn.net/Articles/724522/

                              1 Reply Last reply Reply Quote 1
                              • girish
                                girish Staff last edited by

                                So, the issue here was that there nullmailer installed which was busy creating mails forever (lots and lots of files). Removing that software, fixed the problem.

                                1 Reply Last reply Reply Quote 2
                                • chymian 0
                                  chymian 0 last edited by chymian 0

                                  kudos to @girish
                                  he found the real pbl. (out of i-nodes) within minutes.
                                  from there, we could nail down the cause, by following this:
                                  https://unix.stackexchange.com/questions/26598/how-can-i-increase-the-number-of-inodes-in-an-ext4-filesystem

                                  TL,DR:
                                  one cannot raise i-nodes after fs creation. normally, a tar from rootfs, reformat the rootfs, and restore would be necessary.
                                  but to find out, who is consuming all the inodes, one can do the following:

                                  try du -s --inodes * 2>/dev/null |sort -g then cd into the last dir in output and repeat.

                                  Full Disclosure: not all OS's support --inodes flag for du command (my Mac OS does not) but many Linux OS's do.

                                  one has to cd into the dir with the most i-nodes, recursively going down the tree and finally find the dir with the biggest i-node consumption.

                                  in this case, as girish had mentioned, it was caused by not right configured nullmailer, writing tons of error-msg to /var/spool/nullmailer/failed useing 4.4M i-nodes…
                                  deleting that dir eased the situation ad hoc.
                                  rebooting the server and restart all failed apps (GUI & CLI) fixed it.

                                  thanks for all your help

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  Powered by NodeBB