Backup fails about 50-60% of the time
-
@bazinga usually, the packets out of order happens when the system is low on memory. Can you check the
dmesg
output and see if you have any out of memory messages? I would also check the health of the disk. In the System UI, there is a disk speed value. What is the disk speed? -
@joseph Disk speed is indicated as 211MB/sec, which I think it pretty healthy. It's an SSD disk on Azure infrastructure, so I don't have any reason to believe that disk would be unhealthy.
As far as memory, here is the 24 hr graph of memory utilization from Cloudron:
I don't have much running on my instance and the VM has 8GB of RAM assigned.
As far as dmesg output. I don't see any memory related entries there, bunch of "packet dropped" for network adapter. Anything specific I need to run / look for?
-
@bazinga FWIW, I also have had backup failures and despite troubleshooting the cause or source was never definitively identified. This isn't a knock against Cloudron... these things seem to happen, and the myriad of potential causes, cascading, mean it's better to just start over with a different approach, a different service, etc. Good luck.
-
@djxx Sorry, I don't know what you mean exactly. If we're talking about backup configuration, then I have memory limit set to 1.5GB and upload part size to 10MB. Don't think I've changed any of these values, when setting up Backblaze as a target.
If what your second message says about RAM required > projected backup size, then I would see it as a problem with Cloudron. I cannot imagine how that could be the case with installs with data in tens of gigabytes in size. @nebulon is that really the case?
My backups are about 6.5 GB, if I were to trust Backblaze console.
-
@scooke Yeah, I was just hoping that logging would be sufficient to point to either side of this equation as a root cause. Maybe with a way to change log level or something like that.
As far as change the destination for the backups. Well, I used Minio on a separate Azure VM for years and it was all good, except then something happened and I needed to update Minio. Only to find that Minio was no longer either supporting S3 APIs or something to that extent. So I picked something that I've heard a lot about over the years - Backblaze.
The bigger question here - if Cloudron is at fault (one way or another) then changing the provider wouldn't solve the issue.
-
Today the backup failed again - was stuck and i had to cancel it. Same sort of issue:
Sep 28 01:09:00 box:tasks update 15221: {"percent":43.85714285714286,"message":"Copying part 4 - Etag: \"1cfa597bd5187486ed3704d6a556f513\""} Sep 28 09:08:09 box:database Connection 350 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:08:09 box:database Connection 344 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:08:49 box:database Connection 347 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER Sep 28 09:09:01 box:database Connection 345 error: Packets out of order. Got: 0 Expected: 2 PROTOCOL_PACKETS_OUT_OF_ORDER
Sigh. This is very annoying as I don't even know which system / component is at fault. But judging by "box:database" component name in the logs - it seems this is some error withing Cloudron. Even though I'm not sure what database this refers to.
-
@Nozy Thanks for letting me know about IPv6. Didn't event think whether it's enabled or not, and turns out it is.
Would turning it off have any detrimental effects on Cloudron, do you know? Do I need to be aware of something before switching it off?
-
@bazinga turning off ipv6 should be safe. Do this at interface level instead of server level. See https://superuser.com/questions/575684/how-to-disable-ipv6-on-a-specific-interface-in-linux . That said, I am not sure how it helps here though...
Even though I'm not sure what database this refers to.
The Cloudron 'box' code connects to the local MySQL database. It is losing the connection and/or getting that error from the database in the logs. It's not clear why. Can you also check /var/log/mysql/error.log maybe ?
-
@joseph I cannot even fathom how switching off IPv6 would help with this issue, but since you're saying that backup task uses MySQL db and connects to it - who knows, maybe IPv6 might have some effect there.
I've checked logs for mysql. There are few of them there zipped and they all are 0 bytes / empty, even on the days when backup task fails.
Backup keeps failing - in the past 4 days it succeeded only once. This is getting really annoying as not having backups for DR is a problem.
-
-
@bazinga fwiw, I didn't recommend switching off IPv6. I only said it's safe to switch it off
Unfortunately, given the information, I cannot make out what the issue is. Can you write to support@cloudron.io and we can debug what's going on.
-
@bazinga I had similar issues in recent days using Contabo Object Storage on a Contabo VPS.
I ended up fixing it all simply switching to a different storage solution, here Hetzner Storage Box through SSHFS, and that solved most of my pains.
If you can, try that and save yourself some time and suffering. -
After some time, Girish has added extra code that would increase re-tries count and timeouts, so eventually B2 started working, more or less.
With this said, I've just switched over to Hetzner and will see how that will work for me. Don't feel like paying money to B2 if their backend is overloaded and constantly results in various issues.