Faster backups or more settings options for backups
-
Problem:
the current feature set only allows to exclude apps and schedule the frequency of backups.
At the same time all apps run one after the other and the backup speed is rather slow with 3-7 MBps.
This leads to two problems:- apps with many and/or large files Nextcloud) take longer than the set backup rhythm (in may case 12h). As a result I have no backups for all other apps or the backup run aborts completely.
- apps with a high change rate need many backups, but these are prevented by the apps with the large files (Nextcloud).
Featureset proposal:
- backups should be accelerated. E.g. by running the transfer in parallel for several apps.
- there should be database backups with separate schedules. (in many cases one could do without files, but not database changes).
- there should be different schedules for the apps (Nextcloud once a week, Redmine every 2 hours)
- it should be possible to exclude external volumes from the backup
I would be happy if something would improve on this topic.
-
To illustrate the problem a little more concretely:
I have about 10 apps in my Cloudron. The most important are Gitlab, Mattermost, Redmine and Nextcloud.
Mattermost and Redmine are constantly changing and a backup that is too old can be fatal. Therefore I would like to have many backups, at least every 12 hours.
Nextcloud is different. It's all about data exchange, and the files are usually still available locally on all clients. So here it would be enough to have a backup every few days.
If the creation of backups were very fast, the story would end here. Then I could just have a backup made every few hours.
But: the Nextcloud has 200 GB (external volume included) and a backup therefore runs very long (I already had a runtime of more than 24 hours). As a consequence, no other backup of the other apps is running either. If I now deactivate the regular backup for the Nextcloud, backups before an update will take much longer!Currently I'm doing backups with rsync. I have also tested tgz, but it wasn't faster and much more expensive (more S3 requests?)
-
Odd that the rsync option doesn't cover this, guess you have data-growth-happy users
For interest, what's the destination?
I'm also wondering if the multi-Cloudron features in the works for 6.0 might be another solution so VMs have settings specific to their app needs. Although, that I guess would depend on if it has the ability to set one Cloudron as the master LDAP service to keep the user management a one-time thing.
-
I believe some of this is already planned for 5.5 or 6.0. See this announcement here: https://forum.cloudron.io/topic/2918/what-s-coming-in-5-5 and you'll see a note that reads "Backup upload/download speed - Currently, backups can be quite slow but we have some ideas to speed it up" -- This only addresses part of what you're reporting, but I think some of what you're asking for is already in the pipeline.
-
@simon Have you tried with rsync and hardlinks enabled? Hardlinks preserves space and maybe it's just in my head but I swear it speeds things up a bit too. This will all depend a bit though too on how fast your external disk is accessible.
I have about 29 apps running right now, and my backups are very quick (after the initial backup anyways) using rsync and hardlinks to an ext4 hard disk that's mounted directly (as opposed to CIFS or NFS, etc.). Of course, I don't have 200 GB of data either, so maybe you've got way more data to backup so the number of apps may not matter in that case, but I can say that for me, running a server with 29 apps and using roughly 35 GB of data on the primary hard disk, it backups very quickly. I'd estimate my backups generally take about 2 minutes or less, depending on if there's a WordPress app package update as that means way more gets backed up after those are updated.
If I was to extrapolate from my usage, let's say I had 200 GB of data which is about 5 times what I consume now, then it'd still potentially be just about ten minutes for the backup to complete (assuming it's a linear trend) which seems quite speedy to me still.
How long is it taking the backups to complete for your system? If it's a lot longer than ten minutes, then maybe there's performance issues on the disk you're backing up to or you're not using rsync with hardlinks?
-
To give some update on the tech side (if you are interested in this) about the speed issues. Cloudron code run is a very constrained memory environment - 400MB. Even if your server has 10GB RAM, this limit is hardcoded. This limit covers the full platform code (i.e the server side for the dashboard) and also all the tasks (backups, certs, app installs), cron jobs. What I found with some profiling is that if I can make things faster, if we run these tasks with more memory. So, I am reworking the task code in 5.5 so that it runs in it's own cgroup. Currently, this is the only change pending for 5.5. Hopefully, it gets done this week.
Apart from that, what I am seeing your proposal is :
- per-app backup schedule (doable)
- parallel backups (also doable and is only possible after the above change in 5.5 since running parallel tasks in 400MB won't work)
- split the file backups and db. I understand the problem you are facing where this one big app is "blocking" every other app from backing up properly. But I don't think this can be fixed by not backing up the files to speed things up since for most apps (except many emby/jellyfin), the files are almost like an extension of db. If you snapshot them at various points in time and stitch them together, it will break... Let's think about other ideas here.
-
One subtle thing that makes a big difference is how many reads and writes are happening.
Remember back in the DOS days copying floppies?
Long reads using big buffers and long writes from said buffer was significantly faster than bzt bzt back and forth.Same applies with many small files be written to backups. Hence streaming backups is one of the better ways to avoid this.
Another example is interfacing with object stores. Most s3 connectors come from the minds of posix file point of view and they kill the object store with many small writes when it should be a few large ones.
As for the large Nextcloud backup, why isn't it able to do incrementals?
-
For tgz, what we found is that the slowness is mostly because of the gz part and most of the cloud VPS are not very fast at this. And the whole tgz by it's nature is single core.
For rsync, parallelism and buffer size were indeed a constraint. But these are both now configurable in the 5.5 and 5.6. Note that the concurrency her also depends much on the storage backend. For example, DO can handle only 20 at a time. But S3 can handle 1000s at a time. One has to experiment with the values a bit to figure the right number. Mostly s3 connectors don't publish ideal sizes and concurrency unfortunately.
-