Best way to check integrity after storage system failure?

editor1.shell · 6 December 2021 12:03

A raid6 volume that contained 55TB, 8million chunk Duplicacy backup in a single ‘storage’ unit failed to execute a shrink, causing the whole volume to crash. I restored the volume, but it is very likely some data (chunks) are corrupt now.

what’s the best way to check integrity and restore missing chunks if there is any ?

run check, prune and then re-run the jobs ?

Or is there a better way?

saspus · 6 December 2021 15:39

check -all -chunks -fossils -persist; this will check integrity of all chunks in all snapshots.

Then you would need to assemble list of bad snapshots from the log and delete them from snapshots/*/ folders manually (depending on what chunks got corrupted prune might fail — ie. If one of the corrupted chunks is a snapshot chunk).

Then delete local cache, make sure no other process is touching the datastore for the duration of the following command and run prune -all -exhaustive -exclusive to remove orphaned chunks and free up space.

This would be most straightforward. You can also attempt to repair snapshots instead of deleting them — but this might be more involved and not guaranteed to succeed: to do that following the check above delete the bad chunks, then create new temporary snapshot id and run backup. This will cause new missing chunks to get uploaded to the storage and if stars aligned correctly — data will be packed in the same way thereby re-creating deleted chunks. Then follow with check -all, delete (now hopefully smaller number of) bad snapshots and prune as above.