Solved [SOLVED] Lost all my files on nextcloud, need to better understand backups
Edit: Marking as solved, because even though I've really lost all my files, we understood what was going on and were able to take action to prevent it happening again.
So I've talked about running tests in my production environment, in which I accidentally stopped and removed my
postgresqlcontainer. NextCloud was behaving badly for a while and until the devs helped me and I was back on track. I tried restoring backups, and it seemed to be okay, but tried to access some files today and they're not there.
Went straight to the disk, and the files are not there.
These are files I've not touched for months now, so tried to restore an earlier backup, and still not there.
Went to s3 to check backup files, and, even though the nextcloud app says there's backups from 3 weeks ago, the latest backup on amazon is from a week ago, and they're all incremental (I use
rsyncformat) and these files are not there.
I think s3 was configured to delete data after a while, but it was at least the same amount of days that cloudron is configured, a bit more. I did this once because my cloudron was not deleting old backups and I was getting huge bills from Amazon. And now it seems there are JUST incremental backups there and no full backups.
How can these files just not be there? And how can cloudron be reporting backups from three weeks ago when it was configured for 1 week?
An App backup that was created right before an app updates is also marked as special and persisted for 3 weeks. The rationale is that sometimes while the app itself is working fine, some errors/bugs only get noticed after a couple of weeks.
Maybe those are backups created before an update and were about to be deleted. Could it be that your s3 config was deleting files from the snapshot folder because they didn't change for a while? If so, you'd need to adjust that setting and let cloudron take care of the retention.
It makes sense that backup might be from an update, so thanks for clarifying that.
My s3 bucket was not set up to delete stuff that hasn't changed. I'm using the same settings I used before, and it used to be that I could restore from backups with no problems, but that was a good guess, I've just re-checked and there are no policies applied.
What's weird to me now is that there are NO full backups inside the bucket. If my cloudron were to crash now, I'd just lose all my data, and I don't know how long it has been like that. How can it simply not have most of the data?
I'd love to know what's going on so I can trust these backups again and go back to storing important stuff on my nextcloud instance. I'm pretty much the only user, but I've lost everything that I was storing over there, and there was some pretty important stuff to me. I've come to terms with having lost everything, shit happens, but I'd like to be able to trust it again, is all.
Thanks for helping out, though.
And now it seems there are JUST incremental backups there and no full backups
@malvim Cloudron always does full backups, we don't have incremental backups. What's incremental is just the "upload" part (but this is an implementation details). From the users point of view, all backups are self-contained.
Since you are using rsync, are you saying, the backups on s3 are missing some files or are all files missing altogether? Also, if you go to the Activity Log and filter by backup, do you see any errors or completion events?
@girish Hey! I'll check the completion events, but I went on s3 and the directories inside my backup buckets contain only a few files each.
Given that this atleast the second time this has happened, I will add a warning in the UI when a user selects rsync in the backup configuration.
@girish Yeah, I used to have lifecycle rules because cloudron at one point was not deleting older backups and I started getting huge bills from Amazon because of that, so I had to make sure stuff was deleted.
I also tried the other format for backups, but we were hitting a lot of timeouts and backups were failing, I remember you helped me out with this and I ended up choosing rsync because of that as well.
I removed the lifecycle rule now, but at this point there's just a bunch of corrupted backups and no full backup, and I'd like to not run into this again. How should I proceed to re-start the backup process correctly this time?
@malvim Best way is to change the format to tgz first. Click Save. Then Configure again and click rsync. This is a "hint" to the Cloudron UI to clean up old stuff/cache etc. Now, make a backup. All the files should appear in snapshots/.
BTW, with all the backup config options we put in the previous release, maybe you can just use tgz? I suggest this because S3 charges per API request which can be pricey if you have nextcloud with a lot of files. With tgz, we make only one API request compared to 100s of thousands in rsync mode.
I have also put a warning now in the UI to remove any lifecycle rules.
@girish Yeah, I think I might change to TGZ now... There were about 50GB of data on NextCloud on a quite large number of files. Along with all the other apps (like gitea with a bunch of repos, a couple of wordpress installs with some content.. I thought maybe that was too large for tgz backups, and think I maybe even talked to you on the chat and we thought rsync was the way to go in my case.
If you think that's not the case anymore and tgz could handle some tens of GB of data well, I'll be happy to switch and try it.
@malvim Yes, it should be able to handle up to 100GB easily. I have even seen it being used for ~200GB reliably.
@girish Sounds good, thanks.