I think this is wrong on so many levels I don’t know where to start. My case patently proves the opposite - that the storage wasn’t aware of bad data, and it isn’t its job to verify the integrity of the backup data when only Duplicacy can - say if a bug or memory corruption occurs.
After all, there was no bit-rot here. No amount of redundancy or bit-rot detection was going to help. A different fs, perhaps? Resource monitoring, perhaps? I soon learned of the failure either way.
However, even when disk space was freed up, Duplicacy continued to run backups referencing bad chunks, and it was only running different types of check
that saved me.
Again, this is verifiably innaccurate. It wasn’t “too late” nor did the backup “rot”. A check -chunks
was necessary in order to fix the broken backup storage. And I was able to. The filesystem or storage wasn’t going to help with that.
One thing I learned, is that leaving referenced or even unreferenced bad chunks in the storage can be a very bad thingTM. The best way to deal with that is to run prune -exhaustive
to remove them.
Otherwise, subsequent backups might re-reference those chunks - due to the deterministic nature of chunk hashing - i.e. it sees those chunks in the storage, and assumes they’re good and not bother to re-upload them.
IMO you have these the wrong way around. The reason -chunks
works better is that it quickly validates the chunks in the storage without downloading them multiple times.
In my case, I had a dozen or so failed and ‘successful’ backups - after the storage had ran out of space, all referencing truncated chunks. As far as Duplicacy and the storage was concerned, there was nothing wrong. When the latest backup is referencing bad chunks, subsequent backups will most likely not have full integrity, but still succeed in creating a snapshot.
Thus using -chunks
instead of -files
saved me a lot of time in fixing the storage and getting back up and running. A later check -files
on the last revision only, also confirm that the storage was in good nick, although I might later do a proper -restore
(as I often do).
I actually think this is somewhat of a design flaw (or hole) in Duplicacy’s design…
Plenty of people here have come across 0-byte chunks/snapshots and, thankfully, a regular check
now tests for that (although, annoyingly stops processing further). What about non-0-byte bad chunks? IMO, Duplicacy needs more, quicker, integrity checks.
Perhaps a verification stage at the end of a backup job, that tests that all written chunks exist, have the correct file sizes, and the correct hash (for storage backends that support remote hashing).
At the end of the day, I strongly believe both the storage and Duplicacy need to work in tandem to make sure of backup integrity. Plus having multiple backup copies and testing regularly (check, -chunks, -files and restore) is essential.