While I have encountered a corrupted chunk on a rare occasion or two before, and had to take care to delete them manually, more frequently I’ve had situations where Duplicacy/Vertical Backup has created a lot of 0 byte chunks/snapshot files.
Usually this is as a result of out-of-disk space. Now obviously I should monitor disk space and alert before this becomes a problem, but one of the other issues I’m dealing with on one header-less system is Duplicacy (CLI) sometimes not pruning enough of what it should. I can’t yet pin-point why this issue is occurring; only that logs fill up with 'Marked fossil ’ and nothing else (no deleted fossils, no fossil collection saved at the end). As a result, the disk fills up after a couple months and I have to perform an -exclusive -exhaustive
prune and everything is right again. Lots of disk space freed, pruning starts working again.
IMO Duplicacy needs an ability to self-heal a backup storage - particularly with 0-byte chunks/snapshots - and possibly provably bad chunks. Otherwise the user needs to manually tinker with said storage - an added pain if it’s cloud-based, since a different tool is necessary - and mistakes can happen.
Instead of deleting bad chunks, could it not rename them to .bad when the user, specifically, asks it to -heal
during a check operation?
Then, after a heal, the recommended strategy might be to run the next backup with a new flag (saw it mentioned recently) that skips loading the chunk cache from the previous backup, and doesn’t assume all chunks are present and forcefully checks before each attempt to upload.
This is better than changing the snapshot ID to force a complete re-upload and optionally changing it back, as the user may want to keep the same ID. And better than the user manually deleting chunks from a storage in order to get back up and running again.
Also, the logging related to whether a chunk is file data or metadata could be improved, as the possible remedies change depend on which is which.