Yes, not sure if you are agreeing or disagreeing, but to clarify what I meant:
There are two parts to this:
- Verifying consistency of duplicacy’s “datastore” and restorability of all/any file(s), (assuming storage is reliable).
- Detecting data rot (verifying the assumption above)
The first item absolutely is responsibility of the duplicacy, and is already implemented. This helps detect datastore corruption due to logical bugs (and I use “datastore” here as “database” in a general sense; in duplicacy’s case the implementation allows for quick checks that boil down to reading snapshot files and then verifying that specific chunk files they reference merely exist).
Anything beyond that is indistinguishable from “restore everything into /dev/null”: the only way to verify that files are restorable from flaky storage is to try to restore them, touching all bits involved along the way. But this is very expensive to do from the client (both in technical and financial senses) and therefore is not realistically suitable for periodic checks for most storages. (Most storage API operate on an assumption that storage is reliable and therefore do not even provide facilities to force remote verification in the first place).
Maintaining integrity of the chunk files (bit rot) therefore should be (and traditionally is) done on the server and usually delegated to underlying storage (e.g. filesystems such as ZFS and BTRFS built-in checksum facilities or S3 type storage service level guarantees).
If storage in use does not provide any of those facilities – then yes, I absolutely agree that out-of-band checks should be implemented by the user for periodic re-hashing, for example via rsync’s remote checksum support you have mentioned or other server-side means.
There is a third part,
- Recovery from a data rot.
But I don’t’ think duplicacy, like any other pieces of software, was designed with that in mind – this would require storage redundancy and therefore inflated storage costs to take over what is essentially filesystem’s job.
Instead, and in theory, a third-party tool could remotely retrieve hashes in whatever type it wanted and dump them to a file. You could regularly run this on a Duplicacy storage, processing only new files (chunks and snapshots) for efficiency. In fact, I wonder if rclone might already be able to do this?
This third party tool shall work server-side, otherwise it is not distinguishable from downloading entire datastore every time.