Deleted chunk during backup

Hi there. I have read through Fix missing chunks but I don’t seem to figure out what had happened.

The prune operation run after scheduled backup via UI 2.1.1 on MacOS started reporting this:

 15:08:29.705    Chunk 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 does not exist in the storage

Searching through the logs for that chunk ID I only see this:

myhmbp:logs me$ g 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89
./backup-log-20180801-150513:102:2018-08-01 15:08:29.705 ERROR CHUNK_FIND Chunk 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 does not exist in the storage
./backup-log-20180801-150513:104:ERROR Chunk 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 does not exist in the storage
./prune-log-20180730-212848:14:Marked fossil 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89
./prune-log-20180730-230324:15:Deleted fossil 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 (collection 1)
./prune-log-20180731-220344:2:Marked fossil 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89
./prune-log-20180801-150823:5:Marked fossil 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89
./backup-log-20180730-230000:108:2018-07-30 23:03:29.559 INFO CHUNK_DELETE The chunk 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 has been permanently removed

I did invoke prune with -exhaustive flag once (seemingly around the time the issue started as far as I remember), but never with -exclusive; and the logs above seem to imply that the chunk was not deleted during prune, but during backup.

Running the duplicacy -background check -a -fossils results in everything being OK with all revisions except one, that is missing a lot of chunks

What could have happened?
What additional logging can I provide to help triage this?

No, backup never deletes chunks. The last line:

./backup-log-20180730-230000:108:2018-07-30 23:03:29.559 INFO CHUNK_DELETE The chunk 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 has been permanently removed

was output by the prune command run right after the scheduled backup command and included in the log file backup-log-20180730-230000.

Can you find out which revision references this missing chunk? The log file prune-log-20180730-230324 should tell if this revision is seen by the prune command or not.

The last line was output by the prune command run right after the scheduled backup command and included in the log file backup-log-20180730-230000 .

This makes sense.

Can you find out which revision references this missing chunk? The log file prune-log-20180730-230324 should tell if this revision is seen by the prune command or not.

When I ran duplicacy -d check -a it reported that a lot of chunks in the revision 133 are missing. All other revisions seems OK. the 133 is neither first nor last revision.

The prune-log-20180730-230324 contains a lot of lines like:

Deleted fossil 737dcbc402cbf1b455c963c504738c00405f120c8ec1807f91289bdd3d1bfa89 (collection 1)

including for the chunk in question, and a few lines like

Marked fossil 7fa7b214fbd944a128f5cfddaea4940aa0f6cec80db24d61ba86b966d9a060b3. There is no references to revisions in that file.

If you’re running 2.1.1, then you should see a line in the log file like Snapshot xxx revision xxx was created after collection xxx.

If you don’t see this line in the log file, then it is likely that you’re still running an earlier version affected by this bug: Pruning Vertical Backup storage gone awry · Issue #465 · gilbertchen/duplicacy · GitHub

I’m running 2.1.1, which I have updated to on Jul 24, 2018 (according to Date Added attribute in MacOS) and the the revision in question was created Jul 27:

Snapshot snapshot-name revision 133 created at 2018-07-27 05:00

and this is the only duplicacy binary on this machine, symlinked to the command line on the same date

mymac:~ me$ ls -alt `which duplicacy`
lrwxr-xr-x  1 root  wheel  70 Jul 24 22:22 /usr/sbin/duplicacy@ -> /Applications/Duplicacy.app/Contents/Resources/duplicacy_osx_x64_2.1.1

Some of the prune logs contain the “was created after collection” phrase, but not all:

mymac:logs me$ g "created after collection" | sort
./prune-log-20180724-230248:1:Snapshot snapshotname revision 114 was created after collection 1
./prune-log-20180728-231111:1:Snapshot snapshotname revision 129 was created after collection 1
./prune-log-20180729-002211:1:Snapshot snapshotname revision 148 was created after collection 2
./prune-log-20180730-170505:1:Snapshot snapshotname revision 163 was created after collection 1
./prune-log-20180730-230324:1:Snapshot snapshotname revision 174 was created after collection 1

The last line is from the file that lists deleted chunk.

Edit. I deleted revision 133 and now check reports the similar missing chunks issue with revision 134. I’ll keep removing revisions until check passes.

In the meantime I have an idea what might have went wrong:
Question: what happens if connection is lost mid-prune? Does prune command first delete snapshot and then erases chunks or the other way around? In the latter case if connection is lost after chunks were deleted but snapshot was not this stale snapshot will have this exact issue.

Right, the prune command erases/fossilizes chunks first and then delete the snapshot files, so if the connection is lost before the snapshot files are deleted then these snapshot files will reference missing chunks. The other way is better – it would just leave some unreferenced chunks in this case.

1 Like

Right, the prune command erases/fossilizes chunks first and then delete the snapshot files, so if the connection is lost before the snapshot files are deleted then these snapshot files will reference missing chunks. The other way is better – it would just leave some unreferenced chunks in this case.

Awesome!

Some more data - the following revisions are corrupted 135, 136, 137, 138, 139, 140, 141.

Looking at the snapshot list all those are on the boundary of the hourly/daily retention prune:

Snapshot snapshot-name revision 103 created at 2018-07-22 23:00
Snapshot snapshot-name revision 112 created at 2018-07-23 23:00
Snapshot snapshot-name revision 117 created at 2018-07-24 23:00
Snapshot snapshot-name revision 123 created at 2018-07-25 23:00
Snapshot snapshot-name revision 127 created at 2018-07-26 23:00
Snapshot snapshot-name revision 135 created at 2018-07-27 07:00
Snapshot snapshot-name revision 136 created at 2018-07-27 08:00
Snapshot snapshot-name revision 137 created at 2018-07-27 10:03
Snapshot snapshot-name revision 138 created at 2018-07-27 14:21
Snapshot snapshot-name revision 139 created at 2018-07-27 15:00
Snapshot snapshot-name revision 140 created at 2018-07-27 20:29
Snapshot snapshot-name revision 141 created at 2018-07-27 21:00
Snapshot snapshot-name revision 142 created at 2018-07-27 22:47
Snapshot snapshot-name revision 143 created at 2018-07-27 22:53
Snapshot snapshot-name revision 144 created at 2018-07-27 23:00
Snapshot snapshot-name revision 145 created at 2018-07-27 23:15
Snapshot snapshot-name revision 146 created at 2018-07-27 23:33
Snapshot snapshot-name revision 147 created at 2018-07-28 23:07
Snapshot snapshot-name revision 148 created at 2018-07-29 00:10
Snapshot snapshot-name revision 149 created at 2018-07-29 02:00
Snapshot snapshot-name revision 150 created at 2018-07-29 03:00
Snapshot snapshot-name revision 151 created at 2018-07-29 05:08
Snapshot snapshot-name revision 152 created at 2018-07-29 06:00
Snapshot snapshot-name revision 153 created at 2018-07-29 07:00
Snapshot snapshot-name revision 154 created at 2018-07-29 08:00
Snapshot snapshot-name revision 155 created at 2018-07-29 09:00
Snapshot snapshot-name revision 156 created at 2018-07-29 10:00

And the machine is a laptop that likes to go to sleep and lose VPN connection to the backup destination, and this is consistent with the theorized root cause! Phew. Crisis averted :slight_smile:

Thank you very much for such prompt response – and yes, I agree, perhaps it would be better to nuke snapshots first at the risk of leaving unreferenced chunks laying around in these cases instead, because stale snapshot cause subsequent prune operations to fail and result in way more unnecessary data being kept on storage.