Pruning retention policies for multiple repositories

kgorlen · 11 October 2020 20:08

I’m looking to migrate from CrashPlan. I have multiple repositories that correspond to CrashPlan Backup Sets. The retention policy for most of them is the default CrashPlan policy:

-keep 30:365 -keep 7:90 -keep 1:7

But for a few repositories, e.g. Public and Multimedia, the policy should be the same except for -keep 0:365. Will the following commands accomplish this:

duplicacy prune -id Public -keep 0:365
duplicacy prune -id Multimedia -keep 0:365
duplicacy prune -all -keep 30:365 -keep 7:90 -keep 1:7

If not, please explain why and what commands I should use. Thanks in advance.

gchen · 12 October 2020 19:51

I think what you got is right.

kgorlen · 30 October 2020 18:45

I’ve also created separate repositories containing year-end snapshots from CrashPlan, e.g. -id User1-Archive, -id User2-Archive … with revisions tagged 2011-12-31, 2012-12-31 …, which should never be pruned. So I need to ignore these, e.g.:

duplicacy prune -all -ignore User1-Archive -ignore User2-Archive -keep 30:365 -keep 7:90 -keep 1:7

Correct?

ossssip · 1 January 2021 14:12

A related question: I have a storage with multiple repos with unique retention policies. My pruning routine is something similar to the one shown in the 1st post: I have one duplicacy prune -all call with the general retention policy and then few subsequent duplicacy prune calls for individual snaphot ids with more strict policies. This approach works well for me with one drawback: each individual duplicacy prune call retrieves a complete list of snapshots and chunks, which is quite time-consuming for certain storages. As a result, the prune job takes many hours, just because each new duplicacy instance reads the chunk tree again and again. Would it make sense to extend the prune command functionality with the possibility to parse the retention policies separately for several -id(s) inputs? I mean something like that:
duplicacy prune -id A -keep 7:7 -id B -id C -keep 7:30 -keep 1:7
I think that would significantly reduce the pruning time in my case.

Droolio · 1 January 2021 16:32

Definately agree that it’d be nice if multiple -ids could be specified in a single run…

However, in order to prune a storage, all snapshots have to be considered each run - as far as I understand - it has to compile a full list of chunks in memory (taken from all the snapshots) across the storage in order to determine what it can safely delete - even if you only run it on one id. Though it’s not really listing all chunks on the disk (unless you specify -exhaustive).

Additionally, if you’re applying multiple retentions periods in a single ‘run’, you may want to look at the -collect-only and -delete-only flags.

A normal prune will do both collect and delete steps. So it may be more efficient to run each prune with -collect-only - all except the last prune, which would be without any flags. (Or you could run a final prune -delete-only without any ids to make your script easier to read/make; with each id in its own command executed before that.)

kgorlen · 1 January 2021 18:22

I discovered, thanks to -dry-run, that -ignore doesn’t prevent a snapshot from being pruned, so I have to run 10 prune commands on individual snapshot ids to prevent pruning my four archival snapshots. Wouldn’t it be less error prone and easier to save a retention policy, including a “never prune” policy, in the storage for each snapshot id? That way, a simple prune -all command would do the job.