Backups redundant?
-
I would like to know, why Cloudron has a snapshot folder in the backup and a folder for each daily backup. This seems to be redundant, but surely does make a lot of sense. But, how does it work? I searched the forums, but did not find a discussion about this. Maybe it's to obvious? Would love to understand this miracle...

-
@miednr Good question. Some background:
-
Uploading files is slow and expensive. In this context, "upload" means to put things in the backup destination. For S3, it's a network upload. For a (networked) disk, it is a copy of files over the network.
-
We made a decision that backup data have to be usable without any special tool. What this means is that if you go to your backup storage - you will see .tar.gz files or individual files (rsync). Basically, no special tool needed if you want to move away from Cloudron. It's important for us to give the signal that you don't get locked-in. For this reason, we don't do differential backups (which will involve some custom format).
Which then brings us to snapshot folder. Let's say your app has 50 files. On first backup, we create a folder called snapshot and upload 50 files there. Then, we complete the backup process by creating a timestamped directory and copying contents of snapshot into the timestamped directory. This "copying" is very cheap because there are APIs to do a "remote copy". The timestamped directory is a complete standalone backup. It does not rely on anything else. You can just use normal tools to view files.
After some time, your app has 52 files (it created 2 new files). For the second/next backup, we want to skip uploading 50 unchanged files and only upload the two new ones. The backup system updates the snapshot folder with 2 new files and repeats the copy of the current snapshot to another timestamped directory with "remote copy". Key here is only 2 files got uploaded (which is the expensive part).
In the case of real disks, we use hard links between the snapshot and timestamped dirs. Even if you have 100GB in snapshots directory, there is only one copy of the files. Hardlinks give us a Copy-on-write style filesystem.
For S3 and friends, storage costs are lower than ever. I think 1TB on hetzner is just 5 bucks.
Maybe a better term for snapshot is scratchpad or working dir or something. But the folder name has stuck since we started 12 years ago!
-
-
@girish Thank you for this profound explanation. It's great that you are using Hardlinks. I guess, that's a real space and time saver. I did my first restores with Cloudron via the UI yesterday.
A) What does Cloudron internally use for the restore? Snapshot vs. Timestamp?
B) What could/should anyone use to restore without Cloudron? Snapshot vs. Timestamp?
This should be different, to make sense...
As far as understand, A) is Snapshot and B) is Timestamp. Right?