Fix for kernel bug in Ubuntu 20.04 causing various issues
-
This fix is not needed anymore. Ubuntu has released
5.4.0-135-generic
Cloudron enables automatic ubuntu security updates. Roughly around 2022-11-17, the linux kernel was updated to
5.4.0-132-generic
. You can find the automatic updates log in/var/log/apt/history.log
. This kernel has a bug causing various things like containerd, prometheus node exporter etc to fail. On Cloudron, this manifests itself as:- automatic updates appear to get stuck in 'cleaning up old install"
- cron jobs don't work anymore
- file permissions inside containers become incorrect
For the moment, it's best to revert to the previous kernel
5.4.0-131-generic
. How you do this, depends on your VPS provider. Some VPS providers allow you to change the kernel via their control panels.Please be careful with instructions below. You might have to fine tune it based on your setup/provider.
Many of the modern provider will just use Grub 2 as the kernel (digitalocean, linode, to name a few). On such VPS, please change the kernel as follows:
- Highly recommend taking a snapshot of the server, in case something goes wrong.
- SSH into the server
apt install linux-image-5.4.0-131-generic linux-modules-extra-5.4.0-131-generic
apt-mark hold linux-generic linux-image-generic linux-headers-generic
- Edit
/etc/default/grub
. Find the lineGRUB_DEFAULT=0
. Change this toGRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.4.0-131-generic"
. Important you get this line right, otherwise your server may not boot! update-grub
reboot
- After reboot,
uname -nar
will say5.4.0-131-generic
.
To reverse the above changes:
apt-mark unhold linux-generic linux-image-generic linux-headers-generic
unattended-upgrade -d
- when running this you will see new kernel5.4.0-135-generic
is getting installed.- Edit
/etc/default/grub
. Change the line toGRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 5.4.0-135-generic"
. update-grub
reboot
- After reboot,
uname -nar
will say5.4.0-135-generic
apt remove linux-image-5.4.0-131-generic linux-modules-extra-5.4.0-131-generic
- to remove the old kernel- Edit
/etc/default/grub
. Change the line to `GRUB_DEFAULT=0 update-grub
Related threads:
-
-
-
-
@jdaviescoates said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
@imc67 I've not tried yet, but I'm interested in why @girish recommends the kernel reversion as opposed to updating Ubuntu 22.04?
was thinking about that too, but then first you need to update Cloudron to 7.3.2 and that's not stable yet? Or won't even update because of the issues?
-
@imc67 said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
update Cloudron to 7.3.2 and that's not stable yet?
I'm already on it.
@imc67 said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
Or won't even update because of the issues?
That could be why I guess, might be risky to update with the issue.
But still wondering if it might be the best option for me given I'm already on 7.3.2
-
@girish Ok thank's, I applied this https://forum.cloudron.io/topic/8101/fix-for-kernel-bug-in-ubuntu-20-04-causing-various-issues on most of active instances.
- Should we apply fix to all instances, even those that seemingly don't show any problems?
- What will be next? When they fix kernel issue, we have to operate again to remove fix that we did?
Thank's a lot
-
-
@jdaviescoates Updating to ubuntu 22.04 is a much riskier endeavor than just downgrading the kernel. Downgrading kernel only takes 5 mins. In fact, just this weekend I upgraded all our servers from Ubuntu 18 to 20 and all of them just completely hosed Each one failed in different places - one in upgrading cloud-init, another is still stuck in some "conflicting package" and for another I had to switch from DO mirror to canonical's mirror. I have generally not had good experiences with distro upgrades (on the server atleast). On desktop ubuntu, I feel things are better, maybe because I have the PC in front of me and have more control of the boot loader.
Ubuntu 22 should work fine though with Cloudron 7.3. But note that it requires you to also rebuild all containers because of cgroup v1 to cgroup v2 migration. All this is in the docs but atleast we had 2-3 bug reports of the migration script not working perfectly.
-
@p44 said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
Should we apply fix to all instances, even those that seemingly don't show any problems?
I think it's best to apply to all instances running that kernel.
What will be next? When they fix kernel issue, we have to operate again to remove fix that we did?
Yes, I think we can unhold the kernel packages and then it will keep auto updating to latest kernel.
-
@girish thanks, makes sense.
I guess rather than updating to 22.04 (if people wanted to try that route) it could be safer/ easier to migrate to a fresh install of 22.04?
@imc67 fyi I just did the fix above on my primary netcup server (the other one is only running an unloved instance of Uptime Kuma atm), very easy and seems to have gone smoothly.
-
@jdaviescoates said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
seems to have gone smoothly.
Not so fast, some how my backup mount lost its permission and now I'm unable to remount it
-
@jdaviescoates said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
I guess rather than updating to 22.04 (if people wanted to try that route) it could be safer/ easier to migrate to a fresh install of 22.04?
yes, definitely.
-
-
-
@girish Hello Girish,
after fix «automatic updates appear to get stuck in 'cleaning up old install"» has been solved, but it seems "cron jobs don't work anymore" problem is still there...
Do you have any other feedback on this issue?
-
@p44 said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
"cron jobs don't work anymore" problem is still there.
Seems to have gone for me. Previously my Nextclouds were giving me warnings about that, but they aren't doing that anymore.
-
@jdaviescoates Thank's a lot, I'll do more accurate tests ... it seems only few cron jobs are executed, in external cron panel
-
-
@girish said in Fix for kernel bug in Ubuntu 20.04 causing various issues:
I upgraded all our servers from Ubuntu 18 to 20 and all of them just completely hosed
Interesting.... I upgraded one server from 16 > 18 > 20 and two more servers from 18 > 20 using the Cloudron guides and never had a problem. I was going to upgrade to 22 thinking it'll all be easy but a little more unsure now.
-
@avatar1024 I think it's some issue with the DO apt mirrors. Something is out of rsync. I test out upgrading in vultr/linode and they seem perfect.
-
-
Ubuntu released a new kenel with the fix
5.4.0-135.152
- https://bugs.launchpad.net/ubuntu/+source/containerd/+bug/1996678/comments/28 . I don't know if this kernel arrives as a security update. -
At least on vultr Ubuntu 20.04 repository mirrors, the new fixed kernel is already available via security updates. You can check if this is the same in your instance by running:
apt-get update && apt list --upgradable | grep "\-security"
If it lists
linux-generic/focal-updates,focal-security 5.4.0.135.133 amd64 [upgradable from: 5.4.0.132.132]
then you have to unhold the previously hold packages and eventually it will normally update:apt-mark unhold linux-generic linux-image-generic linux-headers-generic