Rainy Sunday Tales
-
Do not adjust your web browser. What follows is a retelling of an event that may or may not have taken place. This story is shared as a warning to others and for the general entertainment of anyone finding themselves with nothing better to do than to read random forum posts on a %current_day%. Enjoy.
It was raining cats and dogs (not literally) on a Sunday afternoon when I tried to run a backup on my Cloudron. My seat was warm and comfortable from having spent way too much time in it since I wrestled myself out of bed this morning. It provided soothing relief for what was to come.
At first, I suspected nothing wrong. Everything was as it seemed and no error messages lurking behind a page refresh to cause my stomach to feel upset.
It was backup restoration testing time and after readying myself with a fresh cup of PG Tips tea and a German cookie left over from the care parcel my mum sent for Christmas.
And then, my plans for a relaxing afternoon split between feeding the brood, killing NPCS in C&C Generals Zero Hour and waiting for backups to download went to hell!
The manually triggered backup was completed and I sunk into my chair in the same way a bowling ball does in a bean bag, not suspecting to be thrown down the aisle, meeting nine pins of doom head-on.
When I tried to restore my backup set, I was met with an error message I couldn't fathom. So I tried again. And again. But to no avail, the tar ball that supposedly rested safely in its volume was absent!
No backup was able to be restored. Frustrated, I topped up my tea and rummaged for more rapidly depleting supplies of Oma's finest Christmas cookies in an attempt to get to the bottom of this.
I remembered yesterday, my server ran out of disk space because I foolishly threw a ton of data at a volume that turned out to be way too small (imagine The IT Crowd episode where Jen is trying to cram her foot into a shoe that was way too small).
I tried removing files, emails anything that brought the space back to 47% available space, surely enough to run backups.
My remote drive didn't mount, so I switched to another using SSHFS rather than CIFS. Now nothing backs up anymore, my Nextcloud app lies broken - its logs claiming not to know who or what it was. The backups were completed without error pretending everything was OK when the reality was nothing close.
I was stumped. What should I do now? Only a few hours left in the day and dinner time was approaching fast.
I did what every sysadmin desperate enough would do: recover to a new instance with the last known backup set that was 'good'.
Good meaning the horrors and mistakes from the last 48 hours would be wiped out and soon to become nothing but a bad dream.
A dream too horrible to reimagine. I don't dare tell my children of the feeling of doom and dread, the shock that makes you stare into the middle distance for 9,000 yards, of a weekend not well spent.
But all this is behind me now and the children never need to know. My Cloudron runs happily on its new instance and the backups are purring.
As I slump back onto my sofa and my back meeting the comforting embrace of its pillows, I wonder what the next backup test restore day will bring and fall asleep.
-
@3246 Wow. I'm trying to figure out what exactly went wrong...
The expected backup wasn't restorable because it wasn't found, right?And the expected backup wasn't found because it was never made because the disk was already full, did I get that right?
Or, was the backup not found because, even though it apparently DID run, it actually didn't because the disk was already full... right?
Or did the backup complete? And was then not found?
And the disk being full happened the day before, when you threw a ton of data at it. And despite clearing things out so that it seemed that there was enough space for a backup, somehow there actually wasn't, but this fact wasn't made clear?
And the initial backup (that ended up not being found) was just the Nextcloud app, right? Or was it the entire Cloudron initially, and then you thought to focus on the Nextcloud app because it is used most.
But then you end by saying the Cloudron is running happily, so was it just the NC app backup that didn't work, or was it the entire Cloudron?
And if this was just a test, why use the production server?
Maybe you could retell the tale with some more to-the-point details. At the moment it sounds like the full disk interfered with the backups (not Cloudron's responsibility, but a scenario that has been induced several times by others, including myself) but Cloudron told you the backups were successful (Cloudron's fault to give a faulty confirmation - if this is the case this is serious) when there actually wasn't a backup made (plus somehow Cloudron made it seem like there was room for a backup when there actually wasn't, in which case the expectation is that Cloudron wouldn't even try to make the backup, or at least fail with a notification of some sort).
Glad you are up and running though. That's a relief.
-
@scooke said in Rainy Sunday Tales:
The expected backup wasn't restorable because it wasn't found, right?
Kinda, yeah. The folder structure and meta was there but not the tar ball with the actual files.
And the expected backup wasn't found because it was never made because the disk was already full, did I get that right?
No, this is after I cleaned it down.
Or, was the backup not found because, even though it apparently DID run, it actually didn't because the disk was already full... right?
No, I don't think so. This was afterwards when I ran it manually today.
Or did the backup complete? And was then not found?
That's the wicked bit: Cloudron marked it as completed successfully. But it wasn't.
And the disk being full happened the day before, when you threw a ton of data at it. And despite clearing things out so that it seemed that there was enough space for a backup, somehow there actually wasn't, but this fact wasn't made clear?
I think the free space after clearing up was true. I was able to do stuff again and all seemed OK.
And the initial backup (that ended up not being found) was just the Nextcloud app, right? Or was it the entire Cloudron initially, and then you thought to focus on the Nextcloud app because it is used most.
I tried a full server backup actually, as I removed all the data I uploaded manually using the file manager. Which may have led to the Nextcloud app getting borked.
But then you end by saying the Cloudron is running happily, so was it just the NC app backup that didn't work, or was it the entire Cloudron?
The entire installation seemed shot to me. I couldn't uninstall or install apps, backup wouldn't mount, ... . Which I believe was a left over from the disk being full. I restarted a couple of times but it didn't work. I manually restarted services and it got better. Well enough for a quick manual backup and migration to a fresh box.
And if this was just a test, why use the production server?
I was trying to take a manual backup from prod to test restore to a local box. That's when I saw the ghost.. I mean found that all was not well
Maybe you could retell the tale with some more to-the-point details. At the moment it sounds like the full disk interfered with the backups (not Cloudron's responsibility, but a scenario that has been induced several times by others, including myself) but Cloudron told you the backups were successful (Cloudron's fault to give a faulty confirmation - if this is the case this is serious) when there actually wasn't a backup made (plus somehow Cloudron made it seem like there was room for a backup when there actually wasn't, in which case the expectation is that Cloudron wouldn't even try to make the backup, or at least fail with a notification of some sort).
I hope this helps shed some light. I think the two biggest issues for me are:
- More actionable error messages
- Some form of protection from full disks (quotas?)
That, or perhaps more tea
Glad you are up and running though. That's a relief.
Thank you! Hope you had a less eventful weekend.
-
@3246 said in Rainy Sunday Tales:
The manually triggered backup was completed and I sunk into my chair in the same way a bowling ball does in a bean bag, not suspecting to be thrown down the aisle, meeting nine pins of doom head-on.
@3246 itβs a nice writing style, but for this context, citing the actual error messages / excerpts from the log, and your configuration would be more helpful, I think - unless you donβt expect support in this issue and just want to tell a story.
-
@necrevistonnezr said in Rainy Sunday Tales:
@3246 itβs a nice writing style, but for this context, citing the actual error messages / excerpts from the log, and your configuration would be more helpful, I think - unless you donβt expect support in this issue and just want to tell a story.
Thanks, glad you enjoyed it. It's a story and not a support case for sure. I blame myself for the space and subsequent issues.
In a twist to this story, let me tell you that the missing backups where in fact on the server! They just never got copied to the backup volume.