Sometimes, when you run the check command, it may complain about missing chunks:
$ duplicacy check
Storage set to sftp://gchen@192.168.1.125/AcrosyncTest/teststorage
Listing all chunks
Chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 referenced by snapshot test at revision 1 does not exist
Some chunks referenced by snapshot test at revision 1 are missing
All other commands can also report the same missing chunk messages. If that happens, it is recommended to run the check
command instead as it can identify all missing chunks at once for a given snapshot, without any side effects.
Clear the cache
One common cause of missing chunks is a stale cache. This can happen for example when a revision is manually removed from the storage and then a new backup is uploaded with the same revision number. The revision file stored in the cache is still the old one and thus may reference some chunks that have already been deleted.
For this reason it is now recommended that the first thing to do when you see the above error is to completely remove the cache directory. The cache is usually located at the directory .duplicacy/cache
under the current repository. For the location of the cache in the web GUI, please refer to Cache usage details.
Check the storage if the missing chunk actually exists on the storage
Note: The folder of the chunk are the first two characters of the chunk name, for example, chunk
02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5
is in the folder 02
and it’s the file c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5
If the same command with a clean cache still produces the same error, the next step is to check by hand if those chunks actually exist on the storage. Some cloud storage services (such as OneDrive and Hubic) have a bug that prevents the complete chunk list to be returned. In other cases, a chunk may be stored in a wrong folder. For instance, the expected path for the chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5
may be chunks\02\c2\25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5
, but if it were stored as chunks\02\c225aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5
, Duplicacy would have difficulty locating it.
Check if the chunk was not deleted by prune
If a chunk reported as missing in fact does not exist in the storage, then you may need to find out why it is missing.
The prune command is the only command that can delete chunks, and by default Duplicacy always produces a prune log and saved it under the .duplicacy/logs
folder.
Here is a sample prune log:
$cat .duplicacy/logs/prune-log-20180124-205159
Deleted chunk 2302e87bf0a8c863112bbdcd4d7e94e8a12a9939defaa8a3f30423c791119d4c (exclusive mode)
Deleted chunk 7aa4f3192ecbf5a67f52a2e791cfac445116658ec1e3bd00f8ee35dda6964fb3 (exclusive mode)
Deleted chunk 02c25aea4621acdd4c8751d5ab7ff438fb47308ce8738f030b7db0741c37ecb5 (exclusive mode)
Deleted chunk dbbd5c008e107703e59d8f6633d89f9a55075fa6695c113a2f191dd6cddacb53 (exclusive mode)
Deleted chunk 611c478edcc4201f8b48e206391e9929359e71eb31691afc23fb059418d53fb5 (exclusive mode)
Deleted chunk 297dcc3d83dc05b8e697535306a3af847435874cbe7d5a6b5e6918811d418649 (exclusive mode)
Deleted cached snapshot test at revision 1
This log indicates that these chunks were removed when the prune
command was invoked with the -exclusive
option, because these chunks are only referenced by the snapshot to be deleted, and the -exclusive
assumes there weren’t any other ongoing backups.
This is an excerpt from another prune log:
Marked fossil 909a14a87d185b11ec933dba7069fc2b3744288bb169929a3fc096879348b4fc
Marked fossil 0e92f9aa69cc98cd3228fcfaea480585fe1ab64b098b86438a02f7a3c78e797a
Marked fossil 3ab0be596614dd39bcacc2279d49b6fc1e0095c71b594c509a7b5d504d6d111e
Marked fossil a8a1377cab0dd7f25cac4ac3fb451b9948f129904588d9f9b67bead7c878b7d0
These chunks weren’t immediately removed but rather marked as fossils. This is because another ongoing backup that was seen by the prune command may reference any of these chunks. To be safe, the prune command will turn them into fossils, which can be either permanently removed if no such backup exists, or turned back into normal chunks otherwise. Please refer to Lock free deduplication algorithm for a detailed explanation of this technique.
If you can find the missing chunk in any of these prune logs (on all the computers which backup and prune to this storage!), then it is clear that the prune
command removed it in the exclusive mode or marked it as a fossil (which may be removed at a later time). If you think the prune
command mistakenly removed or marked the chunk due to a bug, post a bug report in the forum with relevant logs attached.
Exceptional cases
Please be aware there are some corner cases when a fossil still needed may be mistakenly deleted.
Backups lasting longer than 7 days
If there is a repository doing a backup which takes more than 7 days and the backup started before the chunk was marked as fossil, then the prune
command will think that that particular repository becomes inactive and will be excluded from the criteria for determining safe fossils to be deleted.
Initial backups
The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune
command doesn’t know the existence of such a repository at the fossil deletion time, it may think the fossil isn’t needed any more by any snapshot and thus delete it permanently.
-exclusive
mode
If you see from the log that a missing chunk was deleted in exclusive mode, then it means that the prune command was incorrectly invoked with the -exclusive
option, while there was still a backup in progress from a different computer to the same storage.
Fixing a missing chunk
In all these cases, a check
command after the backup finishes will immediately reveal the missing chunk.
What if the missing chunk can’t be found in any of these prune logs? We may not be able to track down who the culprit was. It could be a bug in Duplicacy, or a bug in the cloud storage service, or it could be a user error. If you do not want to see this happen again, you may need to run a check
command after every backup or before every prune.
Is it possible to recover a missing chunk? Maybe, if the backup where the missing chunk comes from was done recently and the files in that backup haven’t changed since the backup. In this case, you can modify the .duplicacy/preferences
file to assign to the repository a new id that hasn’t been used by any repositories connecting to the same storage, and then run a new backup. This backup will be an initial backup because of the new repository id and therefore attempt to upload all chunks that do not exist in the storage. If you are lucky, this procedure will be able to produce an identical copy of the missing chunk.
If you are uninterested in figuring out why the chunk went missing and just want to fix the issue, you can keep removing by hand the affected snapshot files under the snapshots
folder in the storage, until the check -a
command passes without reporting missing chunks. At this time, you should be able to run new backups. However, there will likely be many unreferenced chunks in the storage. To fix this, run prune -exhaustive
and all unreferenced chunks will be identified and marked as fossils for removal by a subsequent prune command. Or if you’re very sure that no other backups are running, prune -exhaustive -exclusive
can remove these unreferenced chunks immediately.
Can I just
rm
the invalid chunk from the storage, and the next backup will re-upload it (if necessary)?
Simply deleting the bad chunks won’t get Duplicacy to re-upload them, so this is not a solution.