One bit flipped. Now I don't know which file is real
-
We released backup integrity computation in 9.0 . In 9.1, we have added the verifier. At a high level, it stores sizes and hashes of the files in a file called .backupinfo alongside the backup itself. The .backupinfo is further checksummed and stored in the database so that it's signature can be verified when it's put to use.
The main intention was to catch bugs in the backup logic and also to theoretically detect bitrot. While testing, we already found some bugs in the rsync logic. There was a situation where the code will not delete non-existent files and the backup will end up with superfluous files.
Generally, my approach has been to blame the code whenever I see an integrity check fail. Today, I noticed that my own backups are failing integrity checks. Debugging further, I found that exactly one video file fails the check. The size of the file matches but the hash is different. I found this (built-in) tool called
cmp:# cmp VID_20200712_143638936.mp4 '/home/yellowtent/appsdata/19709657-2cf0-4d3f-8b79-429429d95b17/data/libraries/photos/USA - Jul 2020/VID_20200712_143638936.mp4' VID_20200712_143638936.mp4 /home/yellowtent/appsdata/19709657-2cf0-4d3f-8b79-429429d95b17/data/libraries/photos/USA - Jul 2020/VID_20200712_143638936.mp4 differ: byte 26595693, line 101836OK, so it differs in byte 26595693 . I found that I can start checking other bytes from an offset using the -i arg.
# cmp -i 30000000 VID_20200712_143638936.mp4 '/home/yellowtent/appsdata/19709657-2cf0-4d3f-8b79-429429d95b17/data/libraries/photos/USA - Jul 2020/VID_20200712_143638936.mp4' <nothing>OK, so it is fine from offset 30000000 to EOF (which was 84666735).
Bisecting slowly.... mostly because I thought I was going to find the unthinkable.. and I did!
# cmp -i 26595693 VID_20200712_143638936.mp4 '/home/yellowtent/appsdata/19709657-2cf0-4d3f-8b79-429429d95b17/data/libraries/photos/USA - Jul 2020/VID_20200712_143638936.mp4' <nothing>Wow.. actual bitrot in offset 26595692 . Well , what's in that specific byte?
# xxd -b -l 1 -s 26595692 VID_20200712_143638936.mp4 0195d16c: 01010011 S # xxd -b -l 1 -s 26595692 '/home/yellowtent/appsdata/19709657-2cf0-4d3f-8b79-429429d95b17/data/libraries/photos/USA - Jul 2020/VID_20200712_143638936.mp4' 0195d16c: 00010011 .Wow, 1-bit flipped. I have never seen this in real life
I wish this was a text file, because I don't know which is corrupt now - the backup or the original 
This whole thing got me unreasonably excited, thanks for coming to my TED talk.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login