Long backups, local and remote, failing consistently

jadudm

(I have a suspicion that this is a variation on this post from a while back.)

I have configured backups as follows:

backup set	encr?	target	day	time	files	size
bitwarden	Y	storage box	daily	20:00	800	7MB
photos	N	storage box	S	03:00	300K	200GB
photos	N	NAS	Su	03:00	300K	200GB
full (- music, -photos)	Y	NAS	MWF	03:00	18K	12GB
music	N	NAS	T	03:00	?	600GB

What I'm finding is that my Immich (photos) instance does not want to backup. To be more precise: Immich consistently fails a long way into the backup. In both the case where it is talking to a storage box (overseas, for me) and to my local NAS, it is configured as an SSHFS mount. In each location I have set up a folder called $HOME/backups, and used a subpath for each backup (e.g. photos, so that the full path becomes $HOME/backups/photos, $HOME/backups/vaults, etc.). In all cases, I'm using rsync with hardlinks.

I removed the photos (which is large/has many files) and the music from the full backup set, because I want to target them separately for backup. And, I want to make sure my full backup completes.

I can backup the bitwarden instance, because it is small. I have not yet seen the photos complete. I end up somewhere around 290K files, and there's an SSH error that drops. I don't know what the root cause is. (And, I'm now waiting for another backup, because Immich kicked off an update... so, I have to wait.)

I'll update this thread if/when it fails again. Possible root causes (that would be difficult for me to work around):

Too many files. I would think rsync would have no problems.
Files changing. Immich likes to touch things. Is it paused during backup? If not, could that be the problem? (There are tempfiles that get created as part of its processes; could those be in the set, then get processed/deleted before the backup gets to them, and then it breaks the backup? But, pausing during backups is disruptive/not appropriate for a live system, so... that's not actually a solution path. Ignore me.)
Not enough RAM. Do I need to give the backup process more RAM?

The NAS is a TrueNAS (therefore Debian) machine sitting next to the Cloudron host. Neither seems to be under any kind of RAM pressure that I can see. Neither is doing anything else of substance while the backups are happening.

Unrelated: I do not know what happens when Immich updates, because I am targeting it with two backup points. Does that mean an app update will trigger a backup to both locations? Will it do so sequentially, or simultaneously?

possible other solutions

I would like the SSHFS backup to "just work." But, I'm aware of the complexity of the systems involved.

Other solutions I could consider:

Use object storage. I don't like this one. When using rsync with many files, I discovered that (on B2) I could end up paying a lot for transactions if I had a frequent backup, because rsync likes to touch so many things. This was the point of getting the NAS.
Run my own object storage on the NAS. I really don't want to do that. And, it doesn't solve my off-site photos backup.
Introduce JuiceFS on the Cloudron host. I could put JuiceFS on the Cloudron host. I dislike this for all of the obvious reasons. But, it would let me set up an SSHFS mount to my remote host, and Cloudron/rsync would think it was a local filesystem. This might only be pushing the problems downwards, though.
Backup locally, and rsync the backup. I think I have the disk space for this. This is probably my most robust answer, but it is... annoying. It means I have to set up a secondary layer of rsync processes. On the other hand, I have confidence that if I set up a local volume, the Cloudron backup will "just work."

Ultimately, I'm trying to figure out how to reliably back things up. I think #4 is my best bet.

jadudm

I'm 140k into another run. Took all day... will bump thread with results when there are results...

jadudm

The Immich (photos) backup ended as follows.

Feb 10 03:11:21 box:backupformat/rsync sync: adding data/upload/upload/d354571e-1804-4798-bd79-e29690172c14/d9/d7/d9d762ae-5a69-461d-9387-84882f110276.jpg.xmp position 227458 try 1
Feb 10 03:11:21 box:backupformat/rsync sync: processing task: {"operation":"add","path":"data/upload/upload/d354571e-1804-4798-bd79-e29690172c14/d9/d7/d9d762ae-5a69-461d-9387-84882f110276.jpg.xmp","reason":"new","position":227458}
Feb 10 03:11:21 Exiting with code 70
Feb 10 03:11:21 box:taskworker Terminated
Feb 10 05:03:04 13:M 10 Feb 2026 10:03:04.004 * 10 changes in 300 seconds. Saving...
Feb 10 05:03:04 13:M 10 Feb 2026 10:03:04.004 * Background saving started by pid 298

I do not know for certain if this was the local or remote backup. Local, the snapshot folder dates Feb 9 03:13, and remote it dates Feb 9, 02:35. Those... appear to be the created times, using ls -ac.

According to logs, my music backup ran at Tuesday at 3AM, and it completed in 1m30s or thereabouts. So, that took place 10m before this failure. The music backup would be against the NAS.

Immich still wants to update.

Are there any thoughts as to what I should consider doing to get to a successful backup of my photos?

Absent a way for Cloudron to successfully backup Immich, I feel like the following are my options:

JuiceFS would probably let rsync complete and support hardlinks. I would create an SSHFS mount via Juice from a folder on my localhost -> the target system. Then, I would mount that folder as a local volume (filesystem). As far as Cloudron would be concerned, it would be a regular filesystem. Downside? It's a moving piece in-between me and my files, and a point for data loss.
I could use object storage, but I'm concerned about operation costs. An rsync -> object store approach with this many files means... probably hundreds of thousands of API calls for every backup. Depending on the provider, that ends up costing.
Use tar? I feel that a tarball is really inefficient, since the photos don't change often/at all.
Backup locally and rsync the backup. This would eat disk, but I have space to spare on the Cloudron host; it runs on a mirrored 8TB pair. If I keep three backups (monthly), I would end up with nearly a TB of data, but I could rsync that to the NAS and remote. The rotation would happen locally, I'd get off-site and local backups, and the cost would be that each photo takes 4x the space (original + 3x copies on the local filesystem for rsync rotation).

jadudm

I could also use the fstab to mount an SSHFS filesystem to the remotes, and let Cloudron backup via filesystem there. This would move the management of the mount out of the hands of Cloudron, and into the hands of the OS.

I don't know if that would help.

jadudm

@james , do you have any thoughts?

I had to reboot the server for updates yesterday; as a result, the Immich app is (again) trying to backup. It is now 14K into another attempt. I have every belief that it will fail some 250K files into the backup.

Do any of the strategies I've brainstormed sound better than the others from y'alls perspective?

We can leave this thread open as I explore, but I think the answer is "I can't backup my photos by simply adding an SSHFS backup location." I apparently have to solve this some other way.

james

Hello @jadudm

@jadudm said in Long backups, local and remote, failing consistently:

Unrelated: I do not know what happens when Immich updates, because I am targeting it with two backup points. Does that mean an app update will trigger a backup to both locations? Will it do so sequentially, or simultaneously?

This depends on the backup site configuration.
You can configure what backup site should back up.

@jadudm said in Long backups, local and remote, failing consistently:

Feb 10 03:11:21 Exiting with code 70

This could indicate a lack of memory so yes maybe increasing the backup memory size could help.

Since SSHFS with RSYNC backups are incremental I am also wondering why it does take to long.

A question about your local TrueNAS connection.
Your Cloudron server is hosted with a provider like Hetzner or is it also hosted locally in the same network as your TrueNAS?
If hosted in the Cloud, maybe the connection between the Cloudron server and your local TrueNAS has some issue.
This could be anything from your internet service provider throtteling the connection, DNS issues, maybe even hardware issues like the network interface and cable on the TrueNAS.
I assume your TrueNAS has a default 1GBit port and cable and is connected to the router directly.
If so, maybe checking the network interface on the TrueNAS is really using 1 GBit and not using a fallback to 100 Mbit due to a bad cable.

jadudm

Good questions. The configuration locally is that the machines all live behind an OpnSense router. Cloudron is hosted on a VM on a small machine (and has 24GB of RAM allocated to it, and does not show signs of RAM pressure), and the NAS itself is running TrueNAS w/ 40GB of RAM available (it is never under RAM pressure, as far as I can tell).

cloudron.lan -> switch -> nas.lan

Both machines are local. The cables could be poor; I can check. This is why I think the SSHFS failure on the Cloudron -> NAS connection is so worrying; there's no good reason why it should fail, from what I can tell.

I can... understand that the SSHFS backup to the storage box might be troublesome, given the distances involved. The local connection, though, should "just work."

I'll dig more into possible memory issues.

james

Hello @jadudm

Also increase your memory limit for the backup site setting.

jadudm

Interesting. I think I had missed that setting before.

I tried two things, but now need to head to work.

I created a SMB share on the NAS. I was able to establish a backup site... and, I just re-created an SSHFS mount per above, and gave it 6GB of RAM.

Feb 11 09:16:30 box:taskworker Starting task 9902. Logs are at /home/yellowtent/platformdata/logs/tasks/9902.log
Feb 11 09:16:30 box:taskworker Running task of type backup
Feb 11 09:16:30 box:backuptask fullBackup: skipped backup ...
Feb 11 09:16:30 box:tasks updating task 9902 with: {"percent":66.38461538461539,"message":"Backing up photos.jadud.com (17/23). Waiting for lock"}
Feb 11 09:16:30 box:locks write: current locks: {"full_backup_task_846414c7-0abc-4ae1-8432-2430e5008342":null,"app_backup_a6dc2056-829f-46c4-bf31-7a93cba4af11":"9902"}
Feb 11 09:16:30 box:locks acquire: app_backup_a6dc2056-829f-46c4-bf31-7a93cba4af11
Feb 11 09:16:30 box:backuptask fullBackup: app photos.jadud.com backup finished. Took 0.002 seconds
Feb 11 09:16:30 box:locks write: current locks: {"full_backup_task_846414c7-0abc-4ae1-8432-2430e5008342":null}
Feb 11 09:16:30 box:locks release: app_backup_a6dc2056-829f-46c4-bf31-7a93cba4af11
Feb 11 09:16:30 box:backuptask fullBackup: skipped backup ...
Feb 11 09:16:30 box:tasks setCompleted - 9902: {"result":[],"error":null,"percent":100}
Feb 11 09:16:30 box:tasks updating task 9902 with: {"completed":true,"result":[],"error":null,"percent":100}
Feb 11 09:16:30 box:taskworker Task took 0.066 seconds
Feb 11 09:16:30 Exiting with code 0

If I try and kick off the backup, it starts up and exits immediately. Is there a lock floating somewhere? (Is that the full backup task lock?)

No backups are running that I can see, but this is now a new behavior. I have rebooted the machine, and this does not change.

No doubt, I've created this problem through my iterations.

Cloudron makes it easy to run web apps like WordPress, Nextcloud, GitLab on your server. Find out more or install now.

Cloudron Forum

Long backups, local and remote, failing consistently

possible other solutions

possible other solutions