Invalid hash while checking chunks

bartcv · 16 February 2021 13:34

Hi,

I have been trialling duplicacy for about one week now with a OneDrive for Business share (backup every 4 hours, 100GB). Unfortunately, my first check -chunks complained after 90% with the following message (4 times):

The chunk 1af01dfe5e78b73f16b16b016d5ac4bb5066254711a39909f5df581e6e704bf3 has a hash id of e55b53129e9e1170a74a707f8a8297d6ff0e0ed8263f4a23b383b24c1861d610; retrying

I tried the procedure mentioned here: Restore fails. The chunk... has a hash id of... retrying . A new backup indeed lists all chunks again and it finishes without an error message. However, a new check -chunks again complains about the same chunk.

What can be done? I only have 1 week history so I can start again. However, it is not a good sign something went wrong already.

Probably the OneDrive protocol is not the most reliable for transmitting messages, but if it is included in duplicacy, I expect it to be almost perfectly reliable. Would erasure encoding help for these kind of errors?

gchen · 17 February 2021 05:07

Yes, you should definitely create a new storage with erasure coding enabled and start a new backup. Normally I wouldn’t suggest erasure coding for cloud storages, but considering this bug I think we should check if it is a OneDrive issue first.

bartcv · 17 February 2021 10:48

Could I check with a hex viewer if the chunk was not upload correctly / truncated?

About erasure coding: Is each chard uploaded individually? If not, I would not be very confident in the error correcting ability if a chunk is truncated during upload and the parity shards are missing.

Also, rclone claims that OneDrive Business allows for a simple hash check: https://rclone.org/onedrive/#modification-time-and-hashes . Wouldn’t that be super useful? It might be a non-standard thing since Synology CloudSync says it cannot do hashing with OneDrive Business; see the table at ‘For Advanced Users’ at https://www.synology.com/en-us/knowledgebase/DSM/help/CloudSync/cloudsync

gchen · 17 February 2021 17:16

Is the storage encrypted? If it is, then the corruption must have happened before the upload because otherwise the error would have been a decryption failure rather than a mismatched id.

You can also try to delete the chunk (after saving a copy somewhere) and then create a new backup with a different backup id to see if the chunk can be regenerated. If so then you can compare the regenerated one with the corrupted one.

bartcv · 17 February 2021 17:56

Not encrypted.

I tried deleting the chunk, renaming the id, and performing a new backup (like in https://forum.duplicacy.com/t/restore-fails-the-chunk-has-a-hash-id-of-retrying/1277/3. The backup did not complain but a check -chunks showed the same error on the original chunk. Maybe I did it wrong?

gchen · 17 February 2021 23:47

Interesting. Can you double check that the chunk was deleted first and then regenerated? You can run check after deleting the chunk which should report that chunk as missing.

If you still get the same error with the regenerated chunk then there might be something wrong with OneDrive. Try to back up to a local storage first and then create a copy job to copy to OneDrive to see if you get the same error.

saspus · 18 February 2021 05:46

Maybe that chunk was still in the cache?

bartcv · 18 February 2021 18:15

I will try during the weekend, although it is possible the files have changed by now.

Just to make sure : If I run check after deleting that bad chunk, it should regenerate it? Or should I rename the id in the .duplicacy/preferences file (this is under .duplicacy-web/repositories/localhost/...) and do a backup?

How do I backup to a local storage and copy to OneDrive without starting from scratch?

@sapsus: no, not in the cache.

gchen · 18 February 2021 19:24

No. Running ‘check’ is simply to confirm that the chunk in question is correctly deleted. To regenerate the chunk you’ll need to create a new backup with a different backup (and with the same filters). The chunk may or may not be generated, depending on whether the files have been changed.

You can’t change the backup id by modifying the .duplicacy/preferences file, since this file is actually auto-generated by the web GUI.

@sapsus may be right. As the first thing to try you should remove the cache directory at .duplicacy-web/repositories/localhost/all/.duplicacy/cache and run check -chunks to make sure it is not complaining about a corrupted on-disk copy.