This was a great advice, and I was able to resolve the problems with my storage. I wrote a note for my future self, which I like to share here; let me know if you think something is not correctly represented. I hope this can be useful to other people:
1. Web resources
- Forum discussion: link about the problem described in this note.
-
Fix missing chunks: this note assumes that we are able to recover the missing chunks. Sometimes however we do not want to recover them because they correctly were removed.
-
Feature Request for a flag option to
prune
snapshots regardless of whether some chunks are missing. This feature request has not been implemented.
2. Description of the problem
When carrying out backups with duplicacy
, it may happen that some of the chunks are missing for a given snapshot. This can occur:
- if the storage is defective and files disappear (see also how to fix missing chunks). One could try to run a check with
-fossils
and -resurrect
options.
- if purge operations are carried out with
-exclusive
flag from a computer and simultaneously from another computer backups or other operations on the storage are carried out,
- a prune operation was started but then interrupted before completion. Chunks were then deleted, but the prune command stopped before deleting the snapshot revision corresponding to the deleted chunks. This situation leaves behind a snapshot revision that cannot be purged anymore. In fact, an error is shown, saying that given chunks referenced by a given snapshot revision do not exist. Unfortunately, there is no flag that allows
duplicacy
to delete these snapshot revisions regardless of whether chunks are missing; a feature Request was made to be able to remove these snapshot revisions, but it has not be considered yet. ^interrupted-prune
3. Forcible deletion of snapshot revisions
Sometimes, there is no way to recover missing chunks. This could be for example because of an [[#^interrupted-prune |interrupted prune operation]]; so, there can be good reasons why we want to forcibly remove snapshot revisions.
3.1. Content of a storage
In a storage, one should find three elements in the root: a config
file, a folder snapshots
and a folder chunks
. We are interested in the snapshots
folder. However, we describe shortly the two subfolders:
3.1.1. Chunks
The folder chunks
contains 256 folders, each named with a hexadecimal number. For a given chunk (e.g., 49305662730665976bd5b4c1454905b3a2d0c5a8953400d4a0e7e796c89df0b1
), the first two letters (in the example, 49
) denotes the subfolder under chunks
. The remaining letters (305662730665976bd5b4c1454905b3a2d0c5a8953400d4a0e7e796c89df0b1
) denote the name of the file for the given chunk.
Typically, we do not need to operate manually on the chunk files. Once we have removed incomplete or corrupted snapshots, it is sufficient to run a prune operation on all snapshots (-all
option) and with -exhaustive
option. This will remove all orphaned chunks.
3.1.2. Snapshots
The folder snapshots contains a series of subfolders, one for each snapshot (backup set) that is backed up. Inside each snapshot folder, we find a series of files named with a number after the snapshot revision. The files are binary and cannot simply be opened in a text editor. If we want to figure out what they contain (usually however not necessary) we can run:
duplicacy cat -id <snapshot_name> -r <revision_number>
3.2. Manual deletion of snapshot revisions
We can find the incomplete/corrupt snapshot revisions by running:
duplicacy check -all -persist
As an alternative to -all
, we can choose a particular snapshot with the -id
option if we know where the problems are and we want to speed up the process.
It is important to select -persist
, or else duplicacy
will give up on the first error reported.
Now, the output of the check command given above will look like similar to:
Repository set to /Users/my_home_dir
Storage set to b2://my-backup
Download URL is: https://f002.backblazeb2.com
Listing all chunks
17 snapshots and 1046 revisions
Total chunk size is 506,311M in 130956 chunks
All chunks referenced by snapshot xyz1 at revision 18 exist
All chunks referenced by snapshot xyz1 at revision 20 exist
All chunks referenced by snapshot xyz2 at revision 121 exist
All chunks referenced by snapshot xyz2 at revision 157 exist
All chunks referenced by snapshot xyz2 at revision 193 exist
Chunk 6168db045e48a09d4a02d0e4d8cf0a5a031631285d92a1b418ce33c8cc709b00 referenced by snapshot xyz2 at revision 210 does not exist
Some chunks referenced by snapshot xyz2 at revision 210 are missing
All chunks referenced by snapshot xyz2 at revision 229 exist
All chunks referenced by snapshot xyz2 at revision 262 exist
All chunks referenced by snapshot xyz2 at revision 284 exist
All chunks referenced by snapshot xyz3 at revision 1 exist
All chunks referenced by snapshot xyz3 at revision 1030 exist
All chunks referenced by snapshot xyz3 at revision 1375 exist
All chunks referenced by snapshot xyz3 at revision 2814 exist
Chunk 04457b7d84b3278a23f5c6de4a87069368c9c3fa0ac18e7792a907e134516745 referenced by snapshot xyz3 at revision 2815 does not exist
Some chunks referenced by snapshot xyz3 at revision 2815 are missing
All chunks referenced by snapshot xyz3 at revision 2816 exist
All chunks referenced by snapshot xyz3 at revision 2818 exist
Some chunks referenced by some snapshots do not exist in the storage
In the example, two snapshots have missing chunks, xyz2
at revision 210 and xyz3
at revision 2815. There is currently no way to remove these snapshot revisions with duplicacy
CLI. However, we can go to the storage and manually delete the revision files of the incomplete/corrupt snapshots.
If the numbers of files to be deleted is very large, it is worth using a script, such as the Python script I wrote for backblaze
storage. The script assumes that the output of the check
operation is stored in a local file named duplicacy_check_log.txt
import re
import os
from b2sdk.v2 import *
ACCOUNT_ID = '123123132'
APPLICATION_KEY = '012340123401234012340123401234'
info = InMemoryAccountInfo()
b2_api = B2Api(info, cache=AuthInfoCache(info))
b2_api.authorize_account("production", ACCOUNT_ID, APPLICATION_KEY)
bucket_name = 'my-backup-name'
bucket = b2_api.get_bucket_by_name(bucket_name)
# Define the regex pattern
pattern = r'Chunk .*? referenced by snapshot (.*?) at revision (\d+) does not exist'
# Initialize a set to keep track of already deleted files
deleted_files = set()
# Open and read the log file
with open('duplicacy_check_log.txt', 'r') as file:
for line in file:
# Try to match the pattern with each line
match = re.match(pattern, line)
if match:
# Extract the snapshot name and revision number
snapshot = match.group(1)
revision = match.group(2)
# Define the file name or path in the B2 bucket
b2_file = f"snapshots/{snapshot}/{revision}"
# Check if the file has already been deleted
if b2_file in deleted_files:
continue
# Add the file to the set of deleted files
deleted_files.add(b2_file)
try:
# Fetch the file information and delete it
file_info = bucket.get_file_info_by_name(b2_file)
bucket.delete_file_version(file_info.id_, b2_file)
print(f"Revision {revision} deleted from snapshot {snapshot}.")
except Exception as e:
print(f"Revision {revision} not found in snapshot {snapshot}.")
Note that if you use a different storage, you would have to adapt the script.
3.3. Final check
After the incomplete/corrupt snapshot revisions have been deleted, it is important to run:
duplicacy prune -all -exhaustive
Use in addition -exclusive
if you are sure that no other backup is currently run on the chosen storage.
Finally, you can a check to verify that the storage is again in good shape:
duplicacy check -all -persist