Prune command details

prune

#1

The prune command has the task of deleting old/unwanted revisions and unused chunks from a storage.

Click here for a list of related forum topics.

Quick overview

NAME:
   duplicacy prune - Prune snapshots by revision, tag, or retention policy

USAGE:
   duplicacy prune [command options]

OPTIONS:
   -id <snapshot id>            delete snapshots with the specified id instead of the default one
   -all, -a                     match against all snapshot IDs
   -r <revision> [+]            delete snapshots with the specified revisions
   -t <tag> [+]                 delete snapshots with the specified tags
   -keep <n:m> [+]              keep 1 snapshot every n days for snapshots older than m days
   -exhaustive                  remove all unreferenced chunks (not just those referenced by deleted snapshots)
   -exclusive                   assume exclusive access to the storage (disable two-step fossil collection)
   -dry-run, -d                 show what would have been deleted
   -delete-only                 delete fossils previously collected (if deletable) and don't collect fossils
   -collect-only                identify and collect fossils, but don't delete fossils previously collected
   -ignore <id> [+]             ignore snapshots with the specified id when deciding if fossils can be deleted
   -storage <storage name>      prune snapshots from the specified storage

Usage

duplicacy prune [command options]

Options

-id <snapshot id>

Delete snapshots with the specified id instead of the default one.

Example:
duplicacy prune -id computer-2

-all, -a

Run the prune command against all snapshot IDs in selected storage.

Example:
duplicacy prune -all

-r <revision> [+]

Delete snapshots with the specified revisions.

Examples:
duplicacy prune -r 6              # delete revision 6
duplicacy prune -r 344-350        # delete revisions starting with 344 to 350 (included)
duplicacy prune -r 310 -r 1322    # delete only the revisions 310 and 1322

-t <tag> [+]

Delete snapshots with the specified tags.

-keep <n:m> [+]

Keep 1 snapshot every n days for snapshots older than m days.

The retention policies are specified by the -keep option, which accepts an argument in the form of two numbers n:m, where n indicates the number of days between two consecutive snapshots to keep, and m means that the policy only applies to snapshots at least m day old. If n is zero, any snapshots older than m days will be removed.

Examples:
duplicacy prune -keep 1:7       # Keep a snapshot per (1) day for snapshots older than 7 days
duplicacy prune -keep 7:30      # Keep a snapshot every 7 days for snapshots older than 30 days
duplicacy prune -keep 30:180    # Keep a snapshot every 30 days for snapshots older than 180 days
duplicacy prune -keep 0:360     # Keep no snapshots older than 360 days

Multiple -keep options must be sorted by their m values in decreasing order.

For example, to combine the above policies into one line, it would become:

duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7

-exhaustive

Remove all unreferenced chunks (not just those referenced by deleted snapshots).

The -exhaustive option will scan the list of all chunks in the storage, therefore it will find not only unreferenced chunks from deleted snapshots, but also chunks that become unreferenced for other reasons, such as those from an incomplete backup.

It will also find any file that does not look like a chunk file.

In contrast, a normal prune command will only identify chunks referenced by deleted snapshots but not any other snapshots.

Example:
duplicacy prune -exhaustive

-exclusive

Assume exclusive access to the storage (disable two-step fossil collection).

The -exclusive option will assume that no other clients are accessing the storage, effectively disabling the two-step fossil collection algorithm.

With this option, the prune command will immediately remove unreferenced chunks.

WARNING: Only run -exclusive when you are sure that no other backup is running, on any other device.

Example:
duplicacy prune -exclusive

-dry-run, -d

This option is used to test what changes the prune command would have done. It is guaranteed not to make any changes on the storage, not even creating the local fossil collection file.

Example:

After running this nothing will be modified in the storage, but duplicacy will show all output just like a normal run:

duplicacy prune -dry-run -all -exhaustive - exclusive

-delete-only

Delete fossils previously collected (if deletable) and don’t collect fossils.

Example:
duplicacy prune -delete-only

-collect-only

Identify and collect fossils, but don’t delete fossils previously collected.

Example:
duplicacy prune -collect-only

The -delete-only option will skip the fossil collection step, while the -collect-only option will skip the fossil deletion step.

-ignore <id> [+]

Ignore snapshots with the specified id when deciding if fossils can be deleted.

-storage <storage name>

Prune snapshots from the specified storage instead of the default one.

Example:
duplicacy prune -storage google-drive

Notes

:bulb: Snapshots to be deleted can be specified by revision numbers, by a tag, by retention policies, or by any combination of these categories.

:bulb: Only one device should run prune

Since duplicacy encourages multiple repositories backing up to the same storage (so that deduplication will be efficient), users might want to run prune from each different repository.

The design of duplicacy however was based on the assumption that only one instance would run the prune command (using -all). This can greatly simplify the implementation.

It also is a bit wasting the resources to have a prune command working on one repository id only, since it still needs to download all backups for all other repository ids in order to decide which chunks are to be deleted.

Finally, in theory race conditions can happen when two instances try to operate on the same chunk at the same time, but in practice it may never happen especially if the prune command runs after the backup so they will start at random times.

:bulb: Pruning is logged

All prune actions are logged by default locally, on the machine where the prune command is executed, under .duplicacy/logs. The prune logs are named similarly to prune-log-20171230-142510.

In the same folder you will also find log files which are empty. There is no need to worry if the files are empty as this means that in that particular prune operation, nothing was pruned from the storage.

:bulb: -exhaustive should be used sparingly

The -exhaustive option is only needed when there are known unreferenced chunks in the storage, for example, when a backup is interrupted by user and terminated due to an error and the files in the repository change afterwards.

It is not recommended to run the prune command regularly with this option without a recent incomplete backup, mainly because if there is an ongoing backup from a different computer, the prune command will mark as fossils all new chunks uploaded by that backup.

Although in the fossil deletion step the prune command can correctly identify that these chunks are actually referenced and thus turn them back into chunks, the cost of extra API calls can be excessive.

:bulb: The last revision can only be deleted in -exclusive mode

The latest revision from each repository can’t be deleted in non-exclusive mode because in theory it is possible that a backup for that repository may be in progress which will use the latest revision as the base, so removal of the latest revision would cause some chunks to be removed even though they are needed by the backup in progress.

:warning: Corner cases when prune may delete too much

There are two corner cases that a fossil still needed may be mistakenly deleted. When there is a backup taking more than 7 days that started before the chunk was marked as fossil, then the prune command will think the repository has become inactive which will then be excluded from the criteria for determining safe fossils
to be deleted.

The other case happens when an initial backup from a newly recreated repository that also started before the chunk was marked as fossil. Since the prune command doesn’t know the existence of such a repository at the fossil deletion time, it may think the fossil isn’t needed any more by any backup and thus delete it permanently.

Therefore, a check command must be used if a backup is an initial backup or takes more than 7 days. Once a backup passes the check command, it is guaranteed that it won’t be affected by any future prune operations.

Two-step fossil collection algorithm

The prune command implements the two-step fossil collection algorithm. It will first find fossil collection files from previous runs and check if contained fossils are eligible for permanent deletion (the fossil deletion step). Then it will search for snapshots to be deleted, mark unreferenced chunks as fossils (by renaming) and save them in a new fossil collection file stored locally (the fossil collection step).

For fossils collected in the fossil collection step to be eligible for safe deletion in the fossil deletion step, at least one new snapshot from each snapshot id must be created between two runs of the prune command. However, some repository may not be set up to back up with a regular schedule, and thus literally blocking other repositories from deleting any fossils. Duplicacy by default will ignore repositories that have no new backup in the past 7 days, and you can also use the -ignore option to skip certain repositories when deciding the deletion criteria.


Migrating from CrashPlan to Duplicacy
Confused with prune retention policy
How to proactively monitor the growth of backup size
Back up to multiple storages
Prune retention policy for Amazon Glacier
Cache usage details
Confused with prune retention policy
Prune operation creates spurious (empty) log file
How to prevent certain snapshots from being pruned?
Prune data, remote storage is full
Migrating from CrashPlan to Duplicacy
Migrating from CrashPlan to Duplicacy
Feedback on Duplicacy Web Edition -- beta releases
Fix missing chunks
Duplicacy User Guide
Supported storage backends
Prune fails running my retention rules
#2