Sometimes, when you run the check command, it may complain about missing chunks:
$ duplicacy check Storage set to sftp://firstname.lastname@example.org/AcrosyncTest/teststorage Listing all chunks Chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 referenced by snapshot test at revision 1 does not exist Some chunks referenced by snapshot test at revision 1 are missing
All other commands can also report the same missing chunk messages. If that happens, it is recommended to run the
check command instead as it can identify all missing chunks at once for a given snapshot, without any side effects.
Clear the cache
One common cause of missing chunks is a stale cache. This can happen for example when a revision is manually removed from the storage and then a new backup is uploaded with the same revision number. The revision file stored in the cache is still the old one and thus may reference some chunks that have already been deleted.
For this reason it is now recommended that the first thing to do when you see the above error is to completely remove the cache directory. The cache is usually located at the directory
.duplicacy/cache under the current repository. For the location of the cache in the web GUI, please refer to Cache usage details.
Check the storage if the missing chunk actually exists on the storage
If the same command with a clean cache still produces the same error, the next step is to check by hand if those chunks actually exist on the storage. Some cloud storage services (such as OneDrive and Hubic) have a bug that prevents the complete chunk list to be returned. In other cases, a chunk may be stored in a wrong folder. For instance, the expected path for the chunk
02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 may be
chunks\02\c2\25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5, but if it were stored as
chunks\02\c225aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5, Duplicacy would have difficulty locating it.
Check if the chunk was not deleted by
If a chunk reported as missing in fact does not exist in the storage, then you may need to find out why it is missing.
The prune command is the only command that can delete chunks, and by default Duplicacy always produces a prune log and saved it under the
Here is a sample prune log:
$cat .duplicacy/logs/prune-log-20180124-205159 Deleted chunk 2302e87bf0a8c863112bbdcd4d7e94e8a12a9939defaa8a3f30423c791119d4c (exclusive mode) Deleted chunk 7aa4f3192ecbf5a67f52a2e791cfac445116658ec1e3bd00f8ee35dda6964fb3 (exclusive mode) Deleted chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 (exclusive mode) Deleted chunk dbbd5c008e107703e59d8f6633d89f9a55075fa6695c113a2f191dd6cddacb53 (exclusive mode) Deleted chunk 611c478edcc4201f8b48e206391e9929359e71eb31691afc23fb059418d53fb5 (exclusive mode) Deleted chunk 297dcc3d83dc05b8e697535306a3af847435874cbe7d5a6b5e6918811d418649 (exclusive mode) Deleted cached snapshot test at revision 1
This log indicates that these chunks were removed when the
prune command was invoked with the
-exclusive option, because these chunks are only referenced by the snapshot to be deleted, and the
-exclusive assumes there weren’t any other ongoing backups.
This is an excerpt from another prune log:
Marked fossil 909a14a87d185b11ec933dba7069fc2b3744288bb169929a3fc096879348b4fc Marked fossil 0e92f9aa69cc98cd3228fcfaea480585fe1ab64b098b86438a02f7a3c78e797a Marked fossil 3ab0be596614dd39bcacc2279d49b6fc1e0095c71b594c509a7b5d504d6d111e Marked fossil a8a1377cab0dd7f25cac4ac3fb451b9948f129904588d9f9b67bead7c878b7d0
These chunks weren’t immediately removed but rather marked as fossils. This is because another ongoing backup that was seen by the prune command may reference any of these chunks. To be safe, the prune command will turn them into fossils, which can be either permanently removed if no such backup exists, or turned back into normal chunks otherwise. Please refer to Lock free deduplication algorithm for a detailed explanation of this technique.
If you can find the missing chunk in any of these prune logs (on all the computers which backup and prune to this storage!), then it is clear that the
prune command removed it in the exclusive mode or marked it as a fossil (which may be removed at a later time). If you think the
prune command mistakenly removed or marked the chunk due to a bug, post a bug report in the forum with relevant logs attached.
Please be aware there are some corner cases when a fossil still needed may be mistakenly deleted.
Backups lasting longer than 7 days
If there is a repository doing a backup which takes more than 7 days and the backup started before the chunk was marked as fossil, then the
prune command will think that that particular repository becomes inactive and will be excluded from the criteria for determining safe fossils to be deleted.
The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the
prune command doesn’t know the existence of such a repository at the fossil deletion time, it may think the fossil isn’t needed any more by any snapshot and thus delete it permanently.
If you see from the log that a missing chunk was deleted in exclusive mode, then it means that the prune command was incorrectly invoked with the
-exclusive option, while there was still a backup in progress from a different computer to the same storage.
Fixing a missing chunk
In all these cases, a
check command after the backup finishes will immediately reveals the missing chunk.
What if the missing chunk can’t be found in any of these prune logs? We may not be able to track down who the culprit was. It could be a bug in Duplicacy, or a bug in the cloud storage service, or it could be a user error. If you do not want to see this happen again, you may need to run a
check command after every backup or before every prune.
Is it possible to recover a missing chunk? Maybe, if the backup where the missing chunk comes from was done recently and the files in that backup haven’t changed since the backup. In this case, you can modify the
.duplicacy/preferences file to assign to the repository a new id that hasn’t been used by any repositories connecting to the same storage, and then run a new backup. This backup will be an initial backup because of the new repository id and therefore attempt to upload all chunks that do not exist in the storage. If you are lucky, this procedure will be able to produce an identical copy of the missing chunk.
If you are uninterested in figuring out why the chunk went missing and just want to fix the issue, you can keep removing by hand the affected snapshot files under the
snapshots folder in the storage, until the
check -a command passes without reporting missing chunks. At this time, you should be able to run new backups. However, there will likely be many unreferenced chunks in the storage. To fix this, run
prune -exhaustive and all unreferenced chunks will be identified and marked as fossils for removal by a subsequent prune command. Or if you’re very sure that no other backups are running,
prune -exhaustive -exclusive can remove these unreferenced chunks immediately.