Backups redundant?
-
I would like to know, why Cloudron has a snapshot folder in the backup and a folder for each daily backup. This seems to be redundant, but surely does make a lot of sense. But, how does it work? I searched the forums, but did not find a discussion about this. Maybe it's to obvious? Would love to understand this miracle...

-
@miednr Good question. Some background:
-
Uploading files is slow and expensive. In this context, "upload" means to put things in the backup destination. For S3, it's a network upload. For a (networked) disk, it is a copy of files over the network.
-
We made a decision that backup data have to be usable without any special tool. What this means is that if you go to your backup storage - you will see .tar.gz files or individual files (rsync). Basically, no special tool needed if you want to move away from Cloudron. It's important for us to give the signal that you don't get locked-in. For this reason, we don't do differential backups (which will involve some custom format).
Which then brings us to snapshot folder. Let's say your app has 50 files. On first backup, we create a folder called snapshot and upload 50 files there. Then, we complete the backup process by creating a timestamped directory and copying contents of snapshot into the timestamped directory. This "copying" is very cheap because there are APIs to do a "remote copy". The timestamped directory is a complete standalone backup. It does not rely on anything else. You can just use normal tools to view files.
After some time, your app has 52 files (it created 2 new files). For the second/next backup, we want to skip uploading 50 unchanged files and only upload the two new ones. The backup system updates the snapshot folder with 2 new files and repeats the copy of the current snapshot to another timestamped directory with "remote copy". Key here is only 2 files got uploaded (which is the expensive part).
In the case of real disks, we use hard links between the snapshot and timestamped dirs. Even if you have 100GB in snapshots directory, there is only one copy of the files. Hardlinks give us a Copy-on-write style filesystem.
For S3 and friends, storage costs are lower than ever. I think 1TB on hetzner is just 5 bucks.
Maybe a better term for snapshot is scratchpad or working dir or something. But the folder name has stuck since we started 12 years ago!
-
-
@girish Thank you for this profound explanation. It's great that you are using Hardlinks. I guess, that's a real space and time saver. I did my first restores with Cloudron via the UI yesterday.
A) What does Cloudron internally use for the restore? Snapshot vs. Timestamp?
B) What could/should anyone use to restore without Cloudron? Snapshot vs. Timestamp?
This should be different, to make sense...
As far as understand, A) is Snapshot and B) is Timestamp. Right?
-
@miednr Good question. Some background:
-
Uploading files is slow and expensive. In this context, "upload" means to put things in the backup destination. For S3, it's a network upload. For a (networked) disk, it is a copy of files over the network.
-
We made a decision that backup data have to be usable without any special tool. What this means is that if you go to your backup storage - you will see .tar.gz files or individual files (rsync). Basically, no special tool needed if you want to move away from Cloudron. It's important for us to give the signal that you don't get locked-in. For this reason, we don't do differential backups (which will involve some custom format).
Which then brings us to snapshot folder. Let's say your app has 50 files. On first backup, we create a folder called snapshot and upload 50 files there. Then, we complete the backup process by creating a timestamped directory and copying contents of snapshot into the timestamped directory. This "copying" is very cheap because there are APIs to do a "remote copy". The timestamped directory is a complete standalone backup. It does not rely on anything else. You can just use normal tools to view files.
After some time, your app has 52 files (it created 2 new files). For the second/next backup, we want to skip uploading 50 unchanged files and only upload the two new ones. The backup system updates the snapshot folder with 2 new files and repeats the copy of the current snapshot to another timestamped directory with "remote copy". Key here is only 2 files got uploaded (which is the expensive part).
In the case of real disks, we use hard links between the snapshot and timestamped dirs. Even if you have 100GB in snapshots directory, there is only one copy of the files. Hardlinks give us a Copy-on-write style filesystem.
For S3 and friends, storage costs are lower than ever. I think 1TB on hetzner is just 5 bucks.
Maybe a better term for snapshot is scratchpad or working dir or something. But the folder name has stuck since we started 12 years ago!
@girish said in Backups redundant?:
We made a decision that backup data have to be usable without any special tool. What this means is that if you go to your backup storage - you will see .tar.gz files or individual files (rsync). Basically, no special tool needed if you want to move away from Cloudron. It's important for us to give the signal that you don't get locked-in. For this reason, we don't do differential backups (which will involve some custom format).
Made me ask to what would one restore this if not Cloudron (It's kind of special ! )?
-
-
@girish Thank you for this profound explanation. It's great that you are using Hardlinks. I guess, that's a real space and time saver. I did my first restores with Cloudron via the UI yesterday.
A) What does Cloudron internally use for the restore? Snapshot vs. Timestamp?
B) What could/should anyone use to restore without Cloudron? Snapshot vs. Timestamp?
This should be different, to make sense...
As far as understand, A) is Snapshot and B) is Timestamp. Right?
@miednr said in Backups redundant?:
A) What does Cloudron internally use for the restore? Snapshot vs. Timestamp?
Always the timestamp! The snapshot is just a working directory.
B) What could/should anyone use to restore without Cloudron? Snapshot vs. Timestamp?
Always timestamp. These are the standalone backups at the specific point in time.
-
@girish said in Backups redundant?:
We made a decision that backup data have to be usable without any special tool. What this means is that if you go to your backup storage - you will see .tar.gz files or individual files (rsync). Basically, no special tool needed if you want to move away from Cloudron. It's important for us to give the signal that you don't get locked-in. For this reason, we don't do differential backups (which will involve some custom format).
Made me ask to what would one restore this if not Cloudron (It's kind of special ! )?
@robi said in Backups redundant?:
Made me ask to what would one restore this if not Cloudron (It's kind of special ! )?
I mean that if someone wants to move apps out of Cloudron, they can use the backups to get the db dumps, config files and the data files.
Of course, there is no magic button/tool to migrate with a click from the backup to non-Cloudron installation. You have to do sysadmin work to migrate away, but I think this is expected. There is no standardized format for these backups, would have been great if there was one.
-
Docs could benefit from having girish explanation added, maybe with a nice Escalidraw diagram.
When there is time available (ha!)