Can a file hash check be carried out on remote files comparing to the hash Duplicacy has stored (without downloading - get info on the remote file instead)?

I would like to check that my backups are good but as cloud providers can charge to download and/or performing a full restore will take a long time I would like to see if there is a better method than the existing just checking if a file/chunk exists but not as “expensive” of having to download the full backup.

As most(?) cloud providers provide a hash of the file stored on their service could Duplicacy perhaps compare that against its stored value when the file was originally uploaded?

Even if there is a small cloud provider API charge to get the file hash of the remote file it would still be much cheaper and way faster than having to download the full backup.

I do not even know if Duplicacy stores a SHA1 hash of the files it creates but as a backup is only worth having if it has been properly validated it would be great to have something that is in the middle of the current options.

I realize there are many unknowns including does Duplicacy store a SHA1 hash and if a remote file was corrupted does the remote file system update the SHA1 but I think it is worth asking the question.

duplicacy check -hashes (fictional example)

This would do exactly what it currently does but if -hashes is stated the SHA1 will be requested from the cloud provider (if it is not returned by listing the files) and that will be used to check it matches what Duplicacy generated when it was first uploaded?

Thanks

There are only two possibilities: download the chunk and check hash (causes egress) or ask the provider to check the hash for you server side.

But, if you don’t trust your provider to keep data intact why would you trust it to actually compute hash and not say return cached pre-computed stale good value? Or hash the good hot cache when cold data has already rotten.

Hence, in my opinion, the remote hash check is pointless. Use providers that provide data integrity guarantees. Duplicacy check commandnis useful for checking datastore consistency — that all needed prices are in place. Integrity if those pieces shall be responsibility of the provider.

What storage are you using? There are ways to do full checks for free. For example, in case of B2 - by egressing to cloudflare, for oracle - by egressing to their compute cloud, similar on Amazon, etc.

2 Likes

7 posts were split to a new topic: Transferring backup between providers, B2, cloudflare, storj