server down: apps not restarting

chymian 0

hey,
today I'm experiening a major downtime with nearly all cloudron apps.
trying to restart the apps gives the following error:
An error occurred during the restart app operation: server error: (HTTP code 500) server error - Cannot restart container 9b309fe6-a68b-47d5-86d7-2797aa67ccdd: failed to create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknown

reboot did not change the situation.
most apps are down.
there was NO notification!!

other things that do not work is bash-completion, apt, etc.
they all use /tmp, which seems too small/full.

from the sysadmin-site, I did not make any changes to the setup in the last week.

CL 5.0.6 runs on a VM 8G RAM, 2 Proc
UBU 18.04

TIA
guenter

subven

@chymian-0 said in server down: apps not restarting:

create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknown

Since rebooting worked, I assume you have disk space left on the system? Do you have SSH (root) access to the server? Does the cloudron web SSH console works? Seems to be an issue with cgroups to me. Does your server come with any kind of limitation?

Please post the output of:

df -h
df -i
cat /proc/cgroups
docker info; echo; echo;

chymian 0

@subven,
that's all checked. n pbls. there.
/tmp & /dev/pts are are pseudofilesystems and are not managed via fstab.
they are too smale.

@girish can you pls. check on this. services are down for 24h now

necrevistonnezr

Have you contacted the developers via their email (support @ cloudron)? They usually get back within hours. It think that's faster than via this discussion forum.

scooke

It could be the system has changed since I had a similar problem July 2018, but in my case old images had remained in /boot when they should have been deleted. I suggest not running any command that changes anything because I have no idea if the commands are still relevant in 2020. I had to shut down the cloudron

sudo systemctl stop box
sudo systemctl stop docker

then run this command to see if they had:

sudo dpkg --list 'linux-image*'|awk '{ if ($1=="ii") print $2}'|grep -v `uname -r`

Then, when it was obvious my /boot was stuffed to the gills with prior linux-images, I had to remove old kernels, adjusting the below to the results from above:

sudo rm -rf /boot/*-4.4.0-{98,97,96,93,62}-*

Then, automatically remove unneeded kernels

sudo purge-old-kernels

After that, bring Cloudron back online:

sudo systemctl restart box
sudo systemctl restart docker
sudo systemctl restart cloudron.target

The fact that this occurred in /boot, and not in the main partition, had thrown us for a little while.

scooke

It could also be you are storing backups locally. You can check in the backup tab yourcloudron.com/#/backups. If so, you will have to delete those somehow. The one line in your error message certainly points to the main culprit: no space left on device. You need to figure out what's using up the space.

subven

@chymian-0 said in server down: apps not restarting:

/tmp & /dev/pts are are pseudofilesystems and are not managed via fstab.
they are too smale.

And

mkdir /tmp/pty465273103: no space left on device: unknown

Have you checked if /tmp is mounted correctly and is writable? It should appear in df -h even if it is a pseudo-filesystem. Since you provided no informations it's hard to help you. Please note that support time is expensive and Cloudrons support only covers problems that are directly caused by cloudron. In addition, time spent on support cannot be used for development, so it is in our best interest to help you here.

chymian 0

@subven
yes, sure
/tmp is not a tempfs, it's on root, and GBs free.
it seems to have to do with cgroups and the space within the containers.
when the system CTs run and one app, then its exhausted.
I tried an older kernel, same.
??

thx everybody for trying to help.

I think thats a pure cloudron/system/cgroup pbl. as I haven't touched that system.
and never came around that on my various other docker projects/server.

girish

@chymian-0 Sure, will be happy to take a look immediately. Are you able to run cloudron-support --enable-ssh and then send a mail to support@cloudron.io with your domain name/IP ?

If that command doesn't work, put our ssh keys in your /root/.ssh/authorized_keys (https://cloudron.io/documentation/support/#ssh-keys)

chymian 0

@girish
hey, thanks for help.
I already did sent a mail to support with the info, a few hours ago. Didn't you receive that?
it's from an ...r@gmx.net address.

girish

@chymian-0 Got it, will look into it shortly.

girish

@chymian-0 From what I can tell tell, there is inode exhaustion in the rootfs. If you do, df -i it tells you that you have run out of inodes. I think this is because this is run on top of btrfs. btrfs is notorious for this. We used to use btrfs on Cloudron 2-3 years ago and gave up because it's just some issue or the other like this. You can to do btrfs balance from outside the cloudron to free up some space, but I am not a btrfs expert.

girish

@chymian-0 The easiest fix is to just give the rootfs more space. Is this possible?

Here's some discussion about it - https://lwn.net/Articles/724522/

girish

So, the issue here was that there nullmailer installed which was busy creating mails forever (lots and lots of files). Removing that software, fixed the problem.

chymian 0

kudos to @girish
he found the real pbl. (out of i-nodes) within minutes.
from there, we could nail down the cause, by following this:
https://unix.stackexchange.com/questions/26598/how-can-i-increase-the-number-of-inodes-in-an-ext4-filesystem

TL,DR:
one cannot raise i-nodes after fs creation. normally, a tar from rootfs, reformat the rootfs, and restore would be necessary.
but to find out, who is consuming all the inodes, one can do the following:

try du -s --inodes * 2>/dev/null |sort -g then cd into the last dir in output and repeat.

Full Disclosure: not all OS's support --inodes flag for du command (my Mac OS does not) but many Linux OS's do.

one has to cd into the dir with the most i-nodes, recursively going down the tree and finally find the dir with the biggest i-node consumption.

in this case, as girish had mentioned, it was caused by not right configured nullmailer, writing tons of error-msg to /var/spool/nullmailer/failed useing 4.4M i-nodes…
deleting that dir eased the situation ad hoc.
rebooting the server and restart all failed apps (GUI & CLI) fixed it.

thanks for all your help

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

server down: apps not restarting