Backup failing with "copy code: 1, signal: null": "cannot create hard link" "operation not permitted"
-
@girish this sounds to me like a resource or policy type exhaustion issue. Like when ulimit it too low or we run out of inodes.
Is anything else running in the kernel, like SE Linux?
Are we hitting limits on hardlinks with large enough backups? I believe the limit on ext4 is 65k
It would be interesting to switch filesystems and see if it happens on xfs for example.
Do you have an Object Store target option via S3?
-
I think the 65k is the number of hardlinks on a file and not the hardlinks on a file system.
The rabbit hole goes as deep as we want to
I think I found the problem though of course I have to try it out.
audit_log_path_denied
here - https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L955 is where the audit log is raised. I am no kernel expert but a casual reading of the comment "Allowed if owner and follower match" suggests that the owner of file and the linker is not matching. The symlinking process runs as useryellowtent
.root@my:/cloudron-backups/snapshot/box/mail# find . -user root -type f ./blah/blah/dovecot-uidlist.lock ./blah/blah/1600972556.M892349P24001.69d0c668883d,S=6124,W=6234:2,S
Bingo! For some reason, these 2 specific files are not owner
yellowtent
and are root. Looks like some bug/race in the code that creates snapshot. Curiously, both the files above are of 0 size, so maybe that's causing some strange event ordering. -
Yay, found the problem Issue is that if a file disappears when we are creating the snapshot, the code errors. Usually files in the snapshot are chowned to the yellowtent user but on an error it ends up creating an empty file in the snapshot directory with the
root
permission. The hard linking code is run asyellowtent
user, and thus symlinking fails. Phew!