Some corrupt files in a large restore

Thanks for creating a great backup tool.

I have backed up several TB to Google Drive using the Web Edition. To verify the integrity of the files, I did a complete restore of everything and then compared all the files to what I had stored locally. Of the several hundred thousand files, 18 files turned out corrupt. The sizes of the restored files were identical to the original ones, but still, the contents were slightly different. Some of the files were images and had visible visual artifacts. According to the logs, the restore was successful.

I then did a second restore of the corrupted files, and now they were restored without any differences from the originals.

Do you have any ideas about why this happened and what I can do to avoid the issue?

Duplicacy checks the file hash after a file has been restored, and will error out if there is a mismatch. The code there is pretty straightforward:

So could the corruption be caused by some disk issue?

Do you still have the corrupt files? It would be interesting to compare them with successfully restored files to see what parts are different.

Thanks for your reply, @gchen!

I can send you a few samples. How do I get in touch with you directly?

Just send a private message:

Thanks @TheBestPessimist!

1 Like

Thank you for sending over the samples. Compared with the original file, the corruption starts at 0x00128000 and ends at 0x0023ffff. This alone could almost exclude Duplicacy from being the culprit because Duplicacy writes the file in chunks, and chunk boundaries are pretty random.

But more interestingly, the corrupted data are just repeats of the same sequence of 512 bytes:

00128000: 241a 9c92 6d85 ce6d 6f93 4cd2 44fc dc3b  $...m..mo.L.D..;
00128010: 9825 67cf 3f77 eac1 c5b7 19f8 16ee f96f  .%g.?w.........o
00128020: 7648 30f5 c961 8735 a3da 2aee a0d8 92c3  vH0..a.5..*.....
00128030: cc6c cd9b 9b53 a069 7901 e494 72d5 4f37  .l...S.iy...r.O7
00128040: aa93 9e81 254c 5ddd d725 b1ba 1cc7 686b  ....%L]..%....hk
00128050: 00b6 abb7 f73e 7631 ad48 42a3 aeb1 05df  .....>v1.HB.....
00128060: deda 645c 8128 1365 0b6f 1f49 78a2 3e33  ..d\.(.e.o.Ix.>3
...
00128200: 241a 9c92 6d85 ce6d 6f93 4cd2 44fc dc3b  $...m..mo.L.D..;
00128210: 9825 67cf 3f77 eac1 c5b7 19f8 16ee f96f  .%g.?w.........o
00128220: 7648 30f5 c961 8735 a3da 2aee a0d8 92c3  vH0..a.5..*.....
00128230: cc6c cd9b 9b53 a069 7901 e494 72d5 4f37  .l...S.iy...r.O7
00128240: aa93 9e81 254c 5ddd d725 b1ba 1cc7 686b  ....%L]..%....hk
00128250: 00b6 abb7 f73e 7631 ad48 42a3 aeb1 05df  .....>v1.HB.....
00128260: deda 645c 8128 1365 0b6f 1f49 78a2 3e33  ..d\.(.e.o.Ix.>3

512 happens to be the default sector size for hard disks so I believe this was caused by some unknown issues on your disk.

5 Likes

Thanks for investigating this, @gchen! It’s appreciated. And it’s very reassuring to hear that there is nothing wrong with the backup. I will have to find out what’s wrong with the disk, though.

Luckily, I had the original files available for comparison and could detect the problem. That would not have been the case in a real data loss situation, though. As a precaution, I have now created hash fingerprints for all my files.

Thanks again!

1 Like