Fix missing chunks

check

#1

Sometimes, when you run the check command, it may complain about missing chunks:

$ duplicacy check
Storage set to sftp://gchen@192.168.1.125/AcrosyncTest/teststorage
Listing all chunks
Chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 referenced by snapshot test at revision 1 does not exist
Some chunks referenced by snapshot test at revision 1 are missing

All other commands can also report the same missing chunk messages. If that happens, it is recommended to run the check command instead as it can identify all missing chunks at once for a given snapshot, without any side effects.

Check the storage if the missing chunk actually exists on the storage

The first thing to do in this situation is to check by hand if those chunks actually exist on the storage. Some cloud storage services (such as OneDrive and Hubic) have a bug that prevents the complete chunk list to be returned. In other cases, a chunk may be stored in a wrong folder. For instance, the expected path for the chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 may be chunks\02\c2\25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5, but if it were stored as chunks\02\c225aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5, Duplicacy would have difficulty locating it.

Check if the chunk was not deleted by prune

If a chunk reported as missing in fact does not exist in the storage, then you may need to find out why it is missing.

The prune command is the only command that can delete chunks, and by default Duplicacy always produces a prune log and saved it under the .duplicacy/logs folder.

Here is a sample prune log:

$cat .duplicacy/logs/prune-log-20180124-205159
Deleted chunk 2302e87bf0a8c863112bbdcd4d7e94e8a12a9939defaa8a3f30423c791119d4c (exclusive mode)
Deleted chunk 7aa4f3192ecbf5a67f52a2e791cfac445116658ec1e3bd00f8ee35dda6964fb3 (exclusive mode)
Deleted chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 (exclusive mode)
Deleted chunk dbbd5c008e107703e59d8f6633d89f9a55075fa6695c113a2f191dd6cddacb53 (exclusive mode)
Deleted chunk 611c478edcc4201f8b48e206391e9929359e71eb31691afc23fb059418d53fb5 (exclusive mode)
Deleted chunk 297dcc3d83dc05b8e697535306a3af847435874cbe7d5a6b5e6918811d418649 (exclusive mode)
Deleted cached snapshot test at revision 1

This log indicates that these chunks were removed when the prune command was invoked with the -exclusive option, because these chunks are only referenced by the snapshot to be deleted, and the -exclusive assumes there weren’t any other ongoing backups.

This is an excerpt from another prune log:

Marked fossil 909a14a87d185b11ec933dba7069fc2b3744288bb169929a3fc096879348b4fc
Marked fossil 0e92f9aa69cc98cd3228fcfaea480585fe1ab64b098b86438a02f7a3c78e797a
Marked fossil 3ab0be596614dd39bcacc2279d49b6fc1e0095c71b594c509a7b5d504d6d111e
Marked fossil a8a1377cab0dd7f25cac4ac3fb451b9948f129904588d9f9b67bead7c878b7d0

These chunks weren’t immediately removed but rather marked as fossils. This is because another ongoing backup that was seen by the prune command may reference any of these chunks. To be safe, the prune command will turn them into fossils, which can be either permanently removed if no such backup exists, or turned back into normal chunks otherwise. Please refer to Lock free deduplication algorithm for a detailed explanation of this technique.

If you can find the missing chunk in any of these prune logs (on all the computers which backup and prune to this storage!), then it is clear that the prune command removed it in the exclusive mode or marked it as a fossil (which may be removed at a later time). If you think the prune command mistakenly removed or marked the chunk due to a bug, then submit a github or forum issue with relevant logs attached.

:exclamation: Exceptional cases

Please be aware there are some corner cases when a fossil still needed may be mistakenly deleted.

Backups lasting longer than 7 days

If there is a repository doing a backup which takes more than 7 days and the backup started before the chunk was marked as fossil, then the prune command will think that that particular repository becomes inactive and will be excluded from the criteria for determining safe fossils to be deleted.

Initial backups

The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune command doesn’t know the existence of such a repository at the fossil deletion time, it may think the fossil isn’t needed any more by any snapshot and thus delete it permanently.

-exclusive mode

If you see from the log that a missing chunk was deleted in exclusive mode, then it means that the prune command was incorrectly invoked with the -exclusive option, while there was still a backup in progress from a different computer to the same storage.

Fixing a missing chunk

In all these cases, a check command after the backup finishes will immediately reveals the missing chunk.

What if the missing chunk can’t be found in any of these prune logs? We may not be able to track down who the culprit was. It could be a bug in Duplicacy, or a bug in the cloud storage service, or it could be a user error. If you do not want to see this happen again, you may need to run a check command after every backup or before every prune.

Is it possible to recover a missing chunk? Maybe, if the backup where the missing chunk comes from was done recently and the files in that backup haven’t changed since the backup. In this case, you can modify the .duplicacy/preferences file to assign to the repository a new id that hasn’t been used by any repositories connecting to the same storage, and then run a new backup. This backup will be an initial backup because of the new repository id and therefore attempt to upload all chunks that do not exist in the storage. If you are lucky, this procedure will be able to produce an identical copy of the missing chunk.

If you are uninterested in figuring out why the chunk went missing and just want to fix the issue, you can keep removing by hand the affected snapshot files under the snapshots folder in the storage, until the check -a command passes without reporting missing chunks. At this time, you should be able to run new backups. However, there will likely be many unreferenced chunks in the storage. To fix this, run prune -exhaustive and all unreferenced chunks will be identified and marked as fossils for removal by a subsequent prune command. Or if you’re very sure that no other backups are running, prune -exhaustive -exclusive can remove these unreferenced chunks immediately.


Backups have stopped, no emails reporting it and missing chunks
Lots of missing chunks with pCloud
Prune operation returned an error, FATAL: chunk xxxx can't be found
If you're running prune and backup in parallel, please upgrade to 2.1.1
Duplicacy User Guide
Missing chunks (ZFS volume)
Deleted chunk during backup
#2

4 posts were split to a new topic: Missing chunks (ZFS volume)


#3

@gchen: hasn’t this been fixed when you added the support for multiple nested levels for the chunks folder?

From what i remember (and i have a bad memory) duplicacy now searches for some versions by default (even if that file which contains the nested levels doesn’t exist).


#4

@gchen is there a need for the repo to be re -created? (i just want to delete the “re” if it’s ok)