Interrupted "copy" operation, missing chunk in remote

bryn.moslow_duplicac · 10 March 2022 20:46

This started with a PEBKAC issue: I rebooted a hypervisor while a “copy” operation was running from local storage to Google Drive. When the system came back up, I resumed the copy and everything seemed fine. However during a subsequent prune operation (right after the copy) I found a missing chunk in the remote.

I had assumed there would be a way to run the copy again and get it to recognize the missing chunk and (re)copy it but “copy” just sees all the snapshots in the remote as existing and goes on its merry way. Is there a way to get Duplicacy to “force copy” that chunk?

Both storages are RSA encrypted with erasure coding 5:2 if that makes any difference.

Edit: I did NOT use “bit-identical” in setting up the remote storage so I didn’t know if a manual copy of the chunk will work properly. It would be helpful if the “copy” operation had a “compare chunks” function/argument that resolved this type of thing.

Edit to the Edit: Interestingly (to me) the chunk in question does NOT exist in the local (source) storage. This find command produced no result:

$ find <path to storage>/chunks -name be2a73b8580b8346bd59d3308696c312582690d387cf400e767f1e8adc438b90 -ls

Droolio · 11 March 2022 02:57

Your missing chunk will be named 2a73b8580b8346bd59d3308696c312582690d387cf400e767f1e8adc438b90 (without the first two chars) inside <path to storage>/chunks/be… though remember, there’s no point looking for it in the destination - even for a partial .tmp file - since you’re using -bit-identical, so filenames will be different.

Personally, what I’d do is rename the snapshot revision file on the destination to <revision number>.bak and re-copy the snapshot. The snapshot file will be in <path to storage>/snapshots/<repository id>.

bryn.moslow_duplicac · 12 March 2022 02:28

It actually appears I was bit by the cache. After finding lots of other chunk-missing-related posts I blew away the .duplicacy/cache folder and the “problem went away”. It only left me wondering after this and all those other posts if there isn’t improvement to be done to the way the cache works. At the least it feels like if a backup is interrupted or Duplicacy realizes something is “missing” or the cache doesn’t otherwise line up with reality some automatic cleaning and recalculation should be happening beyond me punching the cache folder in the ding-ding with an “rm -rf” hammer.

gchen · 12 March 2022 04:56

It is a design flaw not to check the timestamps of chunks in the cache when pulling stuff out of there. Fixing it is on the short to-do list.

But I don’t think an interrupted copy operation can cause inconsistent cache data. Most of the time a false missing chunk issue is caused by deleting the latest revision manually from the storage and then running a backup from a different computer. In your case I think it was more likely that there was some delay before Google Drive made the file available after you uploaded it. It may have nothing to do with the cache.