The "check" command should handle damaged but still recoverable chunks differently

Whissi · 1 May 2024 23:33

Please describe what you are doing to trigger the bug:
I tested the “Erasure Coding” feature using the duplicacy CLI in version 3.2.3 (254953). First, I created a backup. Then, I opened a chunk file in a hex editor and modified a few bytes. Afterward, I executed duplicacy check -chunks.

Please describe what you expect to happen (but doesn’t):

I expected a summary of issues at the end. With over 60,000 chunks in my test, it’s easy to overlook that a chunk was only verifiable after a repair, which was made possible by existing parity information provided by the activated “Erasure Coding” feature.
Additionally, the final message stating “All 61736 chunks have been successfully verified” is misleading. While currently readable, the longevity is uncertain. Users familiar with failing drives know that surrounding sectors or cells are likely to fail too. If the intention is to convey “everything is still readable, albeit needing repairs,” it should mention n chunks needing restoration. I am not sure if every user seeing the following output clearly understands that a chunk in storage is still damaged:

duplicacy check -chunks
Repository set to C:\Testdata
Storage set to N:\NAS\duplicacy-erasure-coding-storage
Listing all chunks
1 snapshots and 1 revisions
Total chunk size is 351,635M in 61737 chunks
All chunks referenced by snapshot TestData2 at revision 1 exist
Verifying 61736 chunks
Skipped 61735 chunks that have already been verified before
Recovering a 8841766 byte chunk from 520104 byte shards: ********-*********--
Verified chunk 00d3b760115d58eaa24b48d2bbc0feb655ed23ee07fb2ba8ce4bcb101040bdaf (1/1), 94.72MB/s 00:00:00 100.0%
All 61736 chunks have been successfully verified
Added 1 chunks to the list of verified chunks

Considering my argument that a chunk, still readable but only after recovery, shouldn’t be labeled “successfully verified,” I didn’t anticipate these chunks being added to the “Verified Chunks” list. Consequently, subsequent calls like duplicacy check -chunks -rewrite do nothing as these chunks are skipped (this is the primary reason why I filed this as a bug report and not as a feature request):

duplicacy check -chunks -rewrite
Repository set to C:\Testdata
Storage set to N:\NAS\duplicacy-erasure-coding-storage
Listing all chunks
1 snapshots and 1 revisions
Total chunk size is 351,635M in 61737 chunks
All chunks referenced by snapshot TestData2 at revision 1 exist
Verifying 61736 chunks
Skipped 61736 chunks that have already been verified before
All 61736 chunks have been successfully verified

Just for completeness: After manually removing the damaged chunk from the verified_chunks file within the repository’s cache folder, when rerunning the previous command, you will receive the following output:

duplicacy check -chunks -rewrite
Repository set to C:\Testdata
Storage set to N:\NAS\duplicacy-erasure-coding-storage
Listing all chunks
1 snapshots and 1 revisions
Total chunk size is 351,635M in 61737 chunks
All chunks referenced by snapshot TestData2 at revision 1 exist
Verifying 61736 chunks
Skipped 61735 chunks that have already been verified before
Recovering a 8841766 byte chunk from 520104 byte shards: ********-*********--
The chunk 00d3b760115d58eaa24b48d2bbc0feb655ed23ee07fb2ba8ce4bcb101040bdaf has been re-uploaded
Verified chunk 00d3b760115d58eaa24b48d2bbc0feb655ed23ee07fb2ba8ce4bcb101040bdaf (1/1), 69.19MB/s 00:00:00 100.0%
All 61736 chunks have been successfully verified
Added 1 chunks to the list of verified chunks

Please describe what actually happens (the wrong behaviour):
The check process concludes without alerting users to potentially damaged, yet recoverable chunks. Additionally, damaged but still recoverable chunks are added to the “Verified Chunks” list, causing them to be skipped during future checks or presumed repair attempts.