Duplicacy thinks shards are incomplete because of inconsistent hashing

Please describe what you are doing to trigger the bug:
I am trying to check (and list revisions of) the same storage (& id) from two different hosts : one AMD64 architecture and one ARM64 architecture (MacBook Pro M1). The storage is encrypted and is using erasure-coding (5/2). I have compiled latest Duplicacy on both hosts (from commit Bump version to 3.0.1 · gilbertchen/duplicacy@72eb339 · GitHub).

The command I run :

duplicacy -d -v list -id SNAPSHOT_ID

Please describe what you expect to happen (but doesn’t):
Both hosts should show the same result since they have the same config for the remote storage. Either the remote storage has complete shards, or it has incomplete shards.

Please describe what actually happens (the wrong behaviour):

  • Host on AMD64 machine shows the list of revisions with no issue.
  • Host on ARM64 machine shows an issue regarding incomplete shards :
Failed to decrypt the file snapshots/SNAPSHOT_ID/1: Not enough chunk data for recover; only 0 out of 7 shards are complete

I made sure that…

  • I tried this with latest official release but also by compiling the same commit on both machines
  • I made sure to delete the .duplicacy/cache on both machine before running the same command

What I have investigated so far
I tried to troubleshoot the issue by logging some intermediate values and found out something interesting. When decrypting the snaphot file on the ARM64 machine, duplicacy is comparing the hash of the shard it reads with the hash present in the snapshot file and the comparison shows the hashes are different. This sets a flag to recover the shard with the parity shards but it fails to recover since this issue occurs for all 7 shards, hence this issue. On the AMD64 machine, the comparison of the same shard shows the hashes are the same (and therefore does not require recovery).

Here is the debug info of Duplicacy trying to decrypt the same snapshot file from the two machines. I am simply logging all info I can from the shards loop located at line 429 in duplicacy/duplicacy_chunk.go at master · gilbertchen/duplicacy · GitHub . I only show the first 3 shards but results are similar for the rest :

On AMD64 machine:

**(shard 0) Bytes to be hashed        : 6475706c6963616379009afb19fdc44d5320de41f18d733fb297e6cc81e3d64d763e7ebf2731d4b1c78656d3898ce105547aea7cfc4833cc2ac8e01be53106d7add0878fe8b61259d4cffdad8e7be869fc935d2f7a6a9ccedafeb3783df90db38b4242261dd57c2651d158bf8b15
**(shard 0) Hash from hasher          : 7be84de55e1bab35efe9c7ade78a5d564d2f0ffed9894bbb9873fe4da827538e
**(shard 0) Hash from encryptedBuffer : 7be84de55e1bab35efe9c7ade78a5d564d2f0ffed9894bbb9873fe4da827538e
**(shard 0) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) offset    : 0
**(shard 0) Hasher (digest) size      : 32
**(shard 0) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=138837371876027623
80) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=431096306328
7117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 0) Hasher Size               : 32
**(shard 0) Hasher BlockSize          : 32
**(shard 0) Hasher - # bytes written  : 110

**(shard 1) Bytes to be hashed        : e860c707f02c9e8ca6f1ee1de5951eccdb8cc97568afb90f49ee5f694c3adc9b858e578e1030fe4cfd8b3a3b6812258a3b6a64a9f856e7da03172578772887f0b8e477ed3fd5168a85a94c568cf49c04e9ff2a3d28bae035ef53ab5b92bd198faefea24c3d0e202f5cb29a63f4aa
**(shard 1) Hash from hasher          : 01816459adac99bc871119f8cd1e13d1f5623fbc1c9a2cacfa315f187ba68c31
**(shard 1) Hash from encryptedBuffer : 01816459adac99bc871119f8cd1e13d1f5623fbc1c9a2cacfa315f187ba68c31
**(shard 1) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) offset    : 0
**(shard 1) Hasher (digest) size      : 32
**(shard 1) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=138837371876027623
80) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=431096306328
7117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 1) Hasher Size               : 32
**(shard 1) Hasher BlockSize          : 32
**(shard 1) Hasher - # bytes written  : 110

**(shard 2) Bytes to be hashed        : 7363cf084a0e39ccf3d6ddafd1b3fc5f625a160359851462f19bf6b05d3e80308f30a97b0ace62f5497e50af6807748af121d1f5d02f51c8ab010632dc3a824aca5cd358ae4bcbdae8f84e8d1a94203da95d6b173385058c9dd655cea85c3bd260c5742e52c2355c8f9ca718c655
**(shard 2) Hash from hasher          : 763ebc8429ada35bd2d56284666e5298a887a4081214a2aec503e9fa1410a94b
**(shard 2) Hash from encryptedBuffer : 763ebc8429ada35bd2d56284666e5298a887a4081214a2aec503e9fa1410a94b
**(shard 2) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) offset    : 0
**(shard 2) Hasher (digest) size      : 32
**(shard 2) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=138837371876027623
80) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=431096306328
7117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 2) Hasher Size               : 32
**(shard 2) Hasher BlockSize          : 32
**(shard 2) Hasher - # bytes written  : 110

On ARM64 machine:

**(shard 0) Bytes to be hashed        : 6475706c6963616379009afb19fdc44d5320de41f18d733fb297e6cc81e3d64d763e7ebf2731d4b1c78656d3898ce105547aea7cfc4833cc2ac8e01be53106d7add0878fe8b61259d4cffdad8e7be869fc935d2f7a6a9ccedafeb3783df90db38b4242261dd57c2651d158bf8b15
**(shard 0) Hash from hasher          : 32ba19ccd3d4003537a18bc66c2710f9d80a50dda9fc608189333db8eb170450
**(shard 0) Hash from encryptedBuffer : 7be84de55e1bab35efe9c7ade78a5d564d2f0ffed9894bbb9873fe4da827538e
**(shard 0) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 0) Hasher (digest) offset    : 0
**(shard 0) Hasher (digest) size      : 32
**(shard 0) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 0) Hasher Size               : 32
**(shard 0) Hasher BlockSize          : 32
**(shard 0) Hasher - # bytes written  : 110

**(shard 1) Bytes to be hashed        : e860c707f02c9e8ca6f1ee1de5951eccdb8cc97568afb90f49ee5f694c3adc9b858e578e1030fe4cfd8b3a3b6812258a3b6a64a9f856e7da03172578772887f0b8e477ed3fd5168a85a94c568cf49c04e9ff2a3d28bae035ef53ab5b92bd198faefea24c3d0e202f5cb29a63f4aa
**(shard 1) Hash from hasher          : c01fabd1ee43d4a9233527967706b885ac65ad1e8e3fc6c0772631c6fd74b1e3
**(shard 1) Hash from encryptedBuffer : 01816459adac99bc871119f8cd1e13d1f5623fbc1c9a2cacfa315f187ba68c31
**(shard 1) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 1) Hasher (digest) offset    : 0
**(shard 1) Hasher (digest) size      : 32
**(shard 1) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 1) Hasher Size               : 32
**(shard 1) Hasher BlockSize          : 32
**(shard 1) Hasher - # bytes written  : 110

**(shard 2) Bytes to be hashed        : 7363cf084a0e39ccf3d6ddafd1b3fc5f625a160359851462f19bf6b05d3e80308f30a97b0ace62f5497e50af6807748af121d1f5d02f51c8ab010632dc3a824aca5cd358ae4bcbdae8f84e8d1a94203da95d6b173385058c9dd655cea85c3bd260c5742e52c2355c8f9ca718c655
**(shard 2) Hash from hasher          : 060985077815ed9f8b9b9292d4ae64ad4cb6c82fb750c59740cc797296d5494b
**(shard 2) Hash from encryptedBuffer : 763ebc8429ada35bd2d56284666e5298a887a4081214a2aec503e9fa1410a94b
**(shard 2) Hasher key fed            : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) key       : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) buffer    : 0000000000000000000000000000000000000000000000000000000000000000
**(shard 2) Hasher (digest) offset    : 0
**(shard 2) Hasher (digest) size      : 32
**(shard 2) Hasher (digest) state     : [%!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079) %!!(MISSING)s(uint64=15845587454020865583) %!!(MISSING)s(uint64=11820040416388919760) %!!(MISSING)s(uint64=1376283091369227076) %!!(MISSING)s(uint64=2611923443488327891) %!!(MISSING)s(uint64=4310963063287117203) %!!(MISSING)s(uint64=13883737187602762380) %!!(MISSING)s(uint64=13714699805381954668) %!!(MISSING)s(uint64=4983270260364809079)]
**(shard 2) Hasher Size               : 32
**(shard 2) Hasher BlockSize          : 32
**(shard 2) Hasher - # bytes written  : 110

My conclusion so far
The hasher does read the same shard data on both machines, but the hasher on the ARM64 machine returns a different hash for the same read bytes. I have not found what is causing the issue but I am suspecting it is related to the highwayhash module and the implementation of either the “Sum()” or “Write()” method (my guess is more the Sum(), because ‘Hasher (digest) state’ info in debug above is identical on both machine and is read after the Write() operation). I am also not saying that the issue is coming from the fact the hosts have different architectures, it may just be a coincidence.

I am also willing to help further if needed.

1 Like

This is very interesting. Can’t really help investigating, but would like to know what the issue is. This is something that should work reliably on all platforms. Pinging @gchen just in case.

1 Like

I’m no go pro but just looking at the dependencies for the github.com/minio/highwayhash module, Duplicacy is using an older 1.0.1 release. The following bug related to computing hashes on arm was fixed awhile ago:

3 Likes

I can confirm that this bug affects arm64 Linux and macOS when the erasure coding is enabled. I wonder what is the best way to fix it. There are 2 options:

  1. Upgrade github.com/minio/highwayhash to 1.0.2 only. This will leave the chunks created with CLI 3.0.1 in a bad state, but certainly it can be fixed by creating a new storage and then running a copy.

  2. Upgrade github.com/minio/highwayhash to 1.0.2 but also keep the 1.0.1 version, such that if a mismatch is found the hash is checked again using the 1.0.1 version. There is no need to fix existing chunks, but the downside is that the storage can’t be checked from a non-arm64 machine.

Option 2 seems to make more sense to me. What is your opinion?

1 Like

Is only 3.0.1 impacted? What about all the previous versions, wouldn’t they have the same problem?

Hi @sevimo, my understanding is that :

  • chunks created with official release of 2.7.2 CLI are not affected (because built with Go < 1.16).
  • Duplicacy versions that have been compiled from source using go 1.16 between 16 November 2020 (2.7.2 release date) and 25 March 2021 (highwayhash v1.0.2 release date) are affected with this issue. The likelyhood is quite low since go 1.16 was first released 16 February 2021, but it is possible.
  • Duplicacy versions that have been compiled from source using go 1.16 between 25 March 2021 (highwayhash v1.0.2 release date) and 06 October 2022 (date when go modules have been added to Duplicacy) are not affected (should not be affected) with this issue.
  • Official release of 3.0/3.0.1 CLI is affected only because duplicacy is now using go modules and there we force the highwayhash module to be v1.0.1 (which has the hashing issue).
  • Duplicacy versions that have been compiled from source since 07 October 2022 are affected, for the same reason (go modules).

Hi @gchen, thank you for your support. I don’t see a perfect solution as we cannot (easily) replicate the hashing bug on non-arm64 machines. Option 2 does make more sense to me as well. Maybe a warning/hint can be shown when the check/list fails to let the person know there is a chance the remote storage may still be “safe” and that the reason of the failure could be because he is checking a arm64 snapshot from non-arm64 machine (which will only affect a limited number of snapshots, due to this specific bug).

I wonder how many snapshots will be affected but this should now increase with the rise of CLI 3.0/3.0.1.

On my side, I will fix the chunks by running a copy because I do check my backups from a non-arm64 machine.

That’s a pickle. Do you know how many arm64 downloads were done for 3.0/3.0.1? Can’t be huge for 1 month period. I’d hate to drag a separate version of a module (indefinitely?) just to support snapshots created under very specific circumstances: arm64 + erasure_coding + October_2022, none of which is very common, to say the least. I’d be shocked if that’s a lot of instances (how many people even know about erasure coding)? Though it would potentially suck to be one of the people impacted.

This is a scenario that can happen again in the future though, not necessarily in the context of a bug, but with changing of storage format versions. I think the better approach for long term is to provide a tool (could be a part of CLI) that can run certain maintenance tasks on existing storages. In this case it would work something like this:

  • Fix the hash bug by bringing in highwayhash 1.0.2 into mainline
  • Bump default storage version (3 to 4, iirc) so that all version 4 data is known to be clean
  • Run maintenance tool that can check all version 3 data, and rewrite it with clean version 4 data if it’s subject to bug [this is only needed to be done by someone potentially subject to the bug]
  • All other existing version 3 data is unaffected (so vast majority of users is not impacted), but all the new data is created as version 4 which would be exactly the same as version 3 except for arm64.

This way someone who has large existing storage with just a few chunks tainted doesn’t need to copy the whole storage (might be slow and potentially expensive). And support for buggy version 3 on arm64 could be dropped soon (in theory, you need only one version with this maintenance functionality so conversion can happen in the future).

Thoughts?

EDIT: And I think the first thing that should happen is a big bold text in release notes for 3.0/3.0.1 as to not use it for arm64 at the moment, at least with erasure coding (maybe even remove these arm64 builds until the fix is in place).

Commit 6a7a2c upgraded github.com/minio/highwayhash to 1.0.2. It also checks against the incorrect hashing (only on arm64 machines) to make sure that chunks created by Duplicacy CLI 3.0.1 can also be read correctly.

In addition, I added a new option -rewrite to the check command so that chunks created by CLI 3.0.1 can be rewritten with the correct highwayhash. You’ll need to first remove the file .duplicacy/cache/storage/verified_chunks which contains the list of chunks verified by previous check commands, otherwise chunks contains in this file won’t be checked and rewritten.

1 Like