How can a just-created snapshot revision be "missing chunks"?

I have a duplicacy repository that backs up my laptop’s home directory onto a B2 bucket. It has been running for a couple years; the revision numbers are up to 1500.

At some point along the way, something went wrong (probably prune being run from two places as I migrated from my old laptop to a new one), and now check reports “missing chunks”.

I can understand that if some chunks referenced by a snapshot revision were deleted from the storage, for whatever reason, then the check command would report “missing chunks” for that revision.

However, I’m seeing “missing chunks” reported for snapshot revision that was created just now — without any prune operation since the revision was created.

How is this possible?

I thought that Duplicacy’s backup operation worked like this:

  1. Crawl the entire source tree (in this case, my laptop home directory).
  2. As directories and files are read, break this source data into “chunks”.
  3. When a chunk is created, calculate its hash; this becomes the name of the chunk.
  4. See if a chunk with this hash already exists in the storage; if not, upload it.
  5. Keep track of all the chunk hashes.
  6. When the whole tree has been traversed, record the list of chunks; this is what the “revision” actually contains.

I must be wrong about some aspect of the above, because if Duplicacy followed these steps on every backup, I don’t see how any chunks could be missing from a new revision immediately after creation.

Please help me understand.

This is not an a full answer to my own question, but it’s information that helps to clarify the situation a little more:

I ran backup again on the snapshot id with missing chunks, this time including the -hash option. The resulting revision is complete (no missing chunks), according to check. This indicates that backup -hash is closer to the algorithm I described above.

I’d still like some help understanding why chunks may be missing when the -hash option is not used.

Once the revision reports missing chunks – does it stay broken or eventually gets fixed?

Does this reproduce after deleting local cache? Perhaps its state is no longer consistent, so duplicacy may create metadata chunk referencing some existing chunk that is in actuality missing from the storage. Cache is supposed to be used for chunks that exist on storage, which is source of truth, but this is a plausible corner case.

Here, in the UploadSnapshot:

So a new revision can reference missing chunks when those chunk IDs were present in the previous snapshot metadata but the corresponding objects were deleted from storage earlier for some reason (e.g. interupted prune?)

It seems to have stayed broken for hundreds of revisions in a row, until I ran backup -hash.

Deleting the local cache did not fix it.

That does seem like it would lead to the problem I experienced.

To confirm: you deleted the local cache from under .duplicacy/repositories, ran backup to b2, immediately ran check, and the newly created revision fails check?

In my case, I ran

find ~/.duplicacy-web -name cache | while read d; do rm -rvf "$d"; done

which deleted the caches for all the Duplicacy Web UI backups.

Then, yes, I ran backup to b2, then check (both commands via the Web UI), and the newly created revision failed the check. The operations followed each other “immediately” in the sense that they there were no prune operations in between them, although a few hours elapsed due to the schedule of the human running the commands :slightly_smiling_face:

Ok, this is very puzzling.

Did you verify that cache subfolders are actually gone? in case there are some permission issues.

Can you add another backup destination in e.g. /tmp, or some sftp server, etc, perhaps for a subset of data, and test backups into there, to rule out B2 shenanigans? (The fact that it works with --hash may be some timing related stuff.

Also, I assume there are no failures in the backup log from B2?