server down: apps not restarting
-
hey,
today I'm experiening a major downtime with nearly all cloudron apps.
trying to restart the apps gives the following error:
An error occurred during the restart app operation: server error: (HTTP code 500) server error - Cannot restart container 9b309fe6-a68b-47d5-86d7-2797aa67ccdd: failed to create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknownreboot did not change the situation.
most apps are down.
there was NO notification!!other things that do not work is bash-completion, apt, etc.
they all use /tmp, which seems too small/full.from the sysadmin-site, I did not make any changes to the setup in the last week.
CL 5.0.6 runs on a VM 8G RAM, 2 Proc
UBU 18.04TIA
guenter -
@chymian-0 said in server down: apps not restarting:
create OCI runtime console socket: mkdir /tmp/pty465273103: no space left on device: unknown
Since rebooting worked, I assume you have disk space left on the system? Do you have SSH (root) access to the server? Does the cloudron web SSH console works? Seems to be an issue with cgroups to me. Does your server come with any kind of limitation?
Please post the output of:
df -h
df -i
cat /proc/cgroups
docker info; echo; echo;
-
Have you contacted the developers via their email (support @ cloudron)? They usually get back within hours. It think that's faster than via this discussion forum.
-
It could be the system has changed since I had a similar problem July 2018, but in my case old images had remained in /boot when they should have been deleted. I suggest not running any command that changes anything because I have no idea if the commands are still relevant in 2020. I had to shut down the cloudron
sudo systemctl stop box sudo systemctl stop docker
then run this command to see if they had:
sudo dpkg --list 'linux-image*'|awk '{ if ($1=="ii") print $2}'|grep -v `uname -r`
Then, when it was obvious my /boot was stuffed to the gills with prior linux-images, I had to remove old kernels, adjusting the below to the results from above:
sudo rm -rf /boot/*-4.4.0-{98,97,96,93,62}-*
Then, automatically remove unneeded kernels
sudo purge-old-kernels
After that, bring Cloudron back online:
sudo systemctl restart box sudo systemctl restart docker sudo systemctl restart cloudron.target
The fact that this occurred in /boot, and not in the main partition, had thrown us for a little while.
-
It could also be you are storing backups locally. You can check in the backup tab yourcloudron.com/#/backups. If so, you will have to delete those somehow. The one line in your error message certainly points to the main culprit:
no space left on device
. You need to figure out what's using up the space. -
@chymian-0 said in server down: apps not restarting:
/tmp & /dev/pts are are pseudofilesystems and are not managed via fstab.
they are too smale.And
mkdir /tmp/pty465273103: no space left on device: unknown
Have you checked if /tmp is mounted correctly and is writable? It should appear in df -h even if it is a pseudo-filesystem. Since you provided no informations it's hard to help you. Please note that support time is expensive and Cloudrons support only covers problems that are directly caused by cloudron. In addition, time spent on support cannot be used for development, so it is in our best interest to help you here.
-
@subven
yes, sure
/tmp is not a tempfs, it's on root, and GBs free.
it seems to have to do with cgroups and the space within the containers.
when the system CTs run and one app, then its exhausted.
I tried an older kernel, same.
??thx everybody for trying to help.
I think thats a pure cloudron/system/cgroup pbl. as I haven't touched that system.
and never came around that on my various other docker projects/server. -
@chymian-0 Sure, will be happy to take a look immediately. Are you able to run
cloudron-support --enable-ssh
and then send a mail to support@cloudron.io with your domain name/IP ?If that command doesn't work, put our ssh keys in your
/root/.ssh/authorized_keys
(https://cloudron.io/documentation/support/#ssh-keys) -
@chymian-0 From what I can tell tell, there is inode exhaustion in the rootfs. If you do,
df -i
it tells you that you have run out of inodes. I think this is because this is run on top of btrfs. btrfs is notorious for this. We used to use btrfs on Cloudron 2-3 years ago and gave up because it's just some issue or the other like this. You can to dobtrfs balance
from outside the cloudron to free up some space, but I am not a btrfs expert. -
@chymian-0 The easiest fix is to just give the rootfs more space. Is this possible?
Here's some discussion about it - https://lwn.net/Articles/724522/
-
kudos to @girish
he found the real pbl. (out of i-nodes) within minutes.
from there, we could nail down the cause, by following this:
https://unix.stackexchange.com/questions/26598/how-can-i-increase-the-number-of-inodes-in-an-ext4-filesystemTL,DR:
one cannot raise i-nodes after fs creation. normally, a tar from rootfs, reformat the rootfs, and restore would be necessary.
but to find out, who is consuming all the inodes, one can do the following:try
du -s --inodes * 2>/dev/null |sort -g
then cd into the last dir in output and repeat.Full Disclosure: not all OS's support --inodes flag for du command (my Mac OS does not) but many Linux OS's do.
one has to cd into the dir with the most i-nodes, recursively going down the tree and finally find the dir with the biggest i-node consumption.
in this case, as girish had mentioned, it was caused by not right configured
nullmailer
, writing tons of error-msg to/var/spool/nullmailer/failed
useing 4.4M i-nodes…
deleting that dir eased the situation ad hoc.
rebooting the server and restart all failed apps (GUI & CLI) fixed it.thanks for all your help
-
-