To conclude this, it was a memory issue. The instance as a whole was a bit overcommited.
If the backup task is idle, it won't consume any memory. Also Cloudron does not reserve memory based on the limits set, neither for backup nor for apps. The limit is just to avoid rouge apps or the backup task to bring kill other apps.
@leggias hm getting a SIGSEGV is odd here. If you like you can enable remote SSH support and send us a mail to firstname.lastname@example.org mentioning your dashboard domain. Then we can take a direct look at your setup to get down to the issue.
Disk snapshots or VPS snapshots don't work well with Cloudron, since backups are per-app and not per-server on Cloudron. This is to be able to rollback/restore individual apps without interfering with the system or other apps running on it.
Still for a secondary fallback backup solution this can still be recommended.
That's what I do. I do Cloudron backups to a Hetzner Storage Box, plus pay Hetzner for their automated backups, plus occasionally do a snapshot too.
Although really I ought to backup to another provider too, to avoid the potential issues @girish once had with DO.
@robi there is a already bunch of "workarounds" for rsync. Empty directories, executable bit of files cannot be stored in most object storage. So, there is fsmetadata.json file that stores this information outside of the files. When restoring, we use that file to restore back the state. I guess we can extend that file to also save and restore timestamps.
If anyone wants this leave a note and I can look into it in the future.
By the way, since there's a ton of backup crashing that seem memory related... is it possible to improve the logging so that there's something in there saying "OutOfMemory" or something like that so poeple know why it's crashing?
Yeah, I would love this as well. Sadly, there seems to be no way on linux to know this reliably. A process just gets killed unceremoniously with SIGKILL and that's that 😕 If anyone know how this can be detected would be great.
Not sure what was going on before but things appear to be working fine now.
Thank you @girish for the clarification on the different types of backups referenced in the log and the explanation that preserveSecs will remain for 3 weeks regardless of the normal backup retenetion period.
@d19dotca Yes, I think you're right, thank you.
I need to look into that.
Or move some long term storage files ("archive") into Minio.
That might just be moving the problem, but dividing the problem might be a partial solution.
@girish Thanks for looking at this. Just triggered another backup with all apps active and it succeeded. I guess then it's finally time to move to the new hardware (plus making sure that the new system is using the c locale by default)
@girish haha, yeah that's fair. Deleting backups is probably one of those tasks that should intentionally be difficult to do to avoid any issues (i.e. they need to delete the backups manually).
Only thing I can suggest maybe is a little icon designation either beside each of the backups that is past the retention policy (or part of a version upgrade backup) that explains it quickly and links to the docs or something, or maybe just one overall info button in the corner of the backups page or something. Not sure which is most feasible. I'll file a feature request formally for that though in a moment.
@girish no, they're pretty useless. Their web UI S3 console is such crap it can't handle the chatty API requests and keeps timing out. Also I may be wrong that multiple directories are because of failures and restarts. It just looks like multiple changed apps per day get a new dir.
So I am attempting other workarounds. Like creating a new bucket and just nuking the old one.
rsync isn't great for object store backups as it makes a ton of small files.
tgz isn't great as it's a lot of repeated information.
We need something hybrid that is the best of both.
Something like backing up to a local Minio much more quickly then doing an object to object store transfer offsite, which is much more efficient. This may also offer an opportunity to dedupe and further optimize.
The first level is the per app level. Right now, apps are already backed up one by one, but they are not stored nor reported individually. And this is the missing feature in my opinion.
Ah no, they are listed in app -> backups. So, even if you do a full backup, each individual app backup will be listed in app -> backups.
Yes, they are listed at the app level, but there's no reporting at the app level because the backup succeeds or fails at the box level.
You can also use that backup to restore/clone the app.
Yes, but only if the backup succeeds for all apps.
So, again, I think the current issue is that everything is treated as a whole while it makes more sense in my opinion to treat each app individually and then in the end (optionally?) bundle the individual parts as a whole.
Yes, so they are treated individually, it's actually very close to what you want. The only issue is that when a full backup fails, those successful individual app backups that are part of a failed full backup will get removed/cleaned up every night.
Exactly, they are per app bot not treated like that in the end because success or failure is determined by the whole box.
I have made a task to fix the behavior, let's see.