Backup fails about 50-60% of the time
-
Hi guys,
I've been using Backblaze for backups for some time now. Lately have been seeing issues with a lot of automated backups just stalling and not finishing - navigating to Backups screen shows a backup task in progress with some random step it's at and it wouldn't time out (today it was sitting there after 9 pm, while the backup is scheduled to kick off at 1 am).
Happens about 50-60% of the time lately - for example over the past week I had only couple successful automated backups. I've planned on upgrading from 7.x to 8.x and it took me 3 or 4 attempts over the weekend before a backup finally went through. So I could then finally upgrade my Cloudron.
Using basic .tgz format for backups (seems to be the fastest way).
What I see in backup logs is the following:
2024-09-23T15:07:18.019Z box:database Connection 194 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER 2024-09-23T15:07:18.020Z box:database Connection 195 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER 2024-09-23T15:07:18.021Z box:database Connection 197 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER
I've seen something very similar before.
Not sure if Backblaze is to blame - I've had some issues with them few months ago, when I couldn't properly access my files via their website. After a ticket things started working fine, but it's been couple of month since then. And if they are to blame, I'd appreciate some guidance on where / how I can obtain logs / details that would prove it's them and not me.
Thank you!
-
-
@bazinga usually, the packets out of order happens when the system is low on memory. Can you check the
dmesg
output and see if you have any out of memory messages? I would also check the health of the disk. In the System UI, there is a disk speed value. What is the disk speed? -
@joseph Disk speed is indicated as 211MB/sec, which I think it pretty healthy. It's an SSD disk on Azure infrastructure, so I don't have any reason to believe that disk would be unhealthy.
As far as memory, here is the 24 hr graph of memory utilization from Cloudron:
I don't have much running on my instance and the VM has 8GB of RAM assigned.
As far as dmesg output. I don't see any memory related entries there, bunch of "packet dropped" for network adapter. Anything specific I need to run / look for?
-
@bazinga FWIW, I also have had backup failures and despite troubleshooting the cause or source was never definitively identified. This isn't a knock against Cloudron... these things seem to happen, and the myriad of potential causes, cascading, mean it's better to just start over with a different approach, a different service, etc. Good luck.
-
@djxx Sorry, I don't know what you mean exactly. If we're talking about backup configuration, then I have memory limit set to 1.5GB and upload part size to 10MB. Don't think I've changed any of these values, when setting up Backblaze as a target.
If what your second message says about RAM required > projected backup size, then I would see it as a problem with Cloudron. I cannot imagine how that could be the case with installs with data in tens of gigabytes in size. @nebulon is that really the case?
My backups are about 6.5 GB, if I were to trust Backblaze console.
-
@scooke Yeah, I was just hoping that logging would be sufficient to point to either side of this equation as a root cause. Maybe with a way to change log level or something like that.
As far as change the destination for the backups. Well, I used Minio on a separate Azure VM for years and it was all good, except then something happened and I needed to update Minio. Only to find that Minio was no longer either supporting S3 APIs or something to that extent. So I picked something that I've heard a lot about over the years - Backblaze.
The bigger question here - if Cloudron is at fault (one way or another) then changing the provider wouldn't solve the issue.
-
Today the backup failed again - was stuck and i had to cancel it. Same sort of issue:
Sep 28 01:09:00 box:tasks update 15221: {"percent":43.85714285714286,"message":"Copying part 4 - Etag: \"1cfa597bd5187486ed3704d6a556f513\""} Sep 28 09:08:09 box:database Connection 350 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:08:09 box:database Connection 344 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:08:49 box:database Connection 347 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:09:01 box:database Connection 345 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER
Sigh. This is very annoying as I don't even know which system / component is at fault. But judging by "box:database" component name in the logs - it seems this is some error withing Cloudron. Even though I'm not sure what database this refers to.
-
@Nozy Thanks for letting me know about IPv6. Didn't event think whether it's enabled or not, and turns out it is.
Would turning it off have any detrimental effects on Cloudron, do you know? Do I need to be aware of something before switching it off?
-
@bazinga turning off ipv6 should be safe. Do this at interface level instead of server level. See https://superuser.com/questions/575684/how-to-disable-ipv6-on-a-specific-interface-in-linux . That said, I am not sure how it helps here though...
Even though I'm not sure what database this refers to.
The Cloudron 'box' code connects to the local MySQL database. It is losing the connection and/or getting that error from the database in the logs. It's not clear why. Can you also check /var/log/mysql/error.log maybe ?
-
@joseph I cannot even fathom how switching off IPv6 would help with this issue, but since you're saying that backup task uses MySQL db and connects to it - who knows, maybe IPv6 might have some effect there.
I've checked logs for mysql. There are few of them there zipped and they all are 0 bytes / empty, even on the days when backup task fails.
Backup keeps failing - in the past 4 days it succeeded only once. This is getting really annoying as not having backups for DR is a problem.
-
-
@bazinga fwiw, I didn't recommend switching off IPv6. I only said it's safe to switch it off
Unfortunately, given the information, I cannot make out what the issue is. Can you write to support@cloudron.io and we can debug what's going on.
-
@bazinga I had similar issues in recent days using Contabo Object Storage on a Contabo VPS.
I ended up fixing it all simply switching to a different storage solution, here Hetzner Storage Box through SSHFS, and that solved most of my pains.
If you can, try that and save yourself some time and suffering.