Re-check existing chunks

96e2ad680c5a2575bf90 · 2 October 2021 23:15

Althoough one disk on the storage target had issues with bad blocks having impact on
existing chunks, this was not reported by check -chunks.

Having a look at the logs, it seems it’s only validating newly uploaded chunks.
It’s telling that all chunks exist, but it’s only verifying the new ones.

Is there a way to enforce verification of all chunks, either existing or new ?

saspus · 3 October 2021 00:40

What do you mean by “newly uploaded”?

By default if you run duplicacy check -chunks it will check default or first snapshot-id. You can pass -all to check all the rest of ones. Otherwise it should be checking all chunks in that ID.

I would also suggest adding -fossils flag.

You can pass -d global flag to see what’s exactly it is doing.

gchen · 3 October 2021 04:36

The list of chunks that have already been verified is stored in the file .duplicacy/cache/storage/verified_chunks. You can delete this file and Duplicacy will verify all chunks again.

96e2ad680c5a2575bf90 · 4 October 2021 17:14

hm, good to know.
However, is there some way (eg switch) to have duplicacy ignore the verififed chunks cache,
so I can run a full verify eg. once a week from schedules?

saspus · 4 October 2021 17:53

Once the chunk was verified once it is ensured that it was transferred to storage correctly. Now it’s responsibility of the storage to keep it consistent, there is no point in verifying the same chunks again.

If your storage does not provide those consistency guarantees (I.e. it’s a simple hard drive) — replace the storage with the one that does (e.g raid array with btrfs or zfs, or the very least that same hard drive formatted with btrfs DUP profile) and run scrub periodically to keep finding and fixing corruption as it occurs thus maintaining data viability

Running check periodically does not solve anything — knowing that the corruption occurred does not fix the corruption; data loss has already happened.

You can of course write pre-check script where you would delete verified-chunks folder, but IMO that would be solving wrong problem…

96e2ad680c5a2575bf90 · 4 October 2021 19:04

Granted … having no redundancy is not a good way to go.
However, having a plain disk is reasonable to me when only limited
or no bandwidth is available to access my redundant network storage
when being abroad. And when you need to stick to windows,
there a not many other options

To have - at least - some kind of fallback, I keep 2 backups
with scheduled checks on my mobile disk in case one gets broken

However, until I recently run a manual check on the disk,
I did not realize that some chunks were broken due to hard errors
even though I had checks scheduled.

Specifying a -chunks I’d have expected duplicacy running a full check.
Duplicacy verifying new chunks only is somewhat … say unexpected … to me
but maybe I should have had a closer look at the manual …

However, having learnt this lection, I will set up a OS schedule to remove the cache,
but having an option like check -allchunks to check ALL chunks,
I would appreciate very much

saspus · 4 October 2021 19:32

Oh… indeed, not much in terms of direct attached storage. Maybe switch to portable SSD instead? Unlike HDD, SSDs run refresh/repair cycles continuously (when powered) so there is smaller chance data loss due to media deterioration. I keep local backup on Samsung t5 ssd (which is just an mSATA ssd with USB bridge)

You can write pre-check.bat with rmdir /s ... and place it to .duplicacy/scripts; it will be executed before check; it might be preferable to OS schedule – as it will guarantee that the cache is gone right before running check.

This sounds reasonable.

Have you considered enabling erasure coding? This will reduce probability of a bad block impacting restorability of your data, within reason, but of course it is not fool proof

96e2ad680c5a2575bf90 · 4 October 2021 21:12

@gchen
Just had a look at the wiki

check -chunks does not mention that it will use that caching approach -for “older” chunks.
Maybe some words should be added there ?

gchen · 5 October 2021 03:44

I’ve updated the Wiki page as well as the user guide. Thanks for pointing that out!