Prune from multiple machines

The Guide mentions that “Please note that if there are multiple Duplicacy instances running on different computers that back up to the same storage, only one Duplicacy instance can have the pruning option enabled.”

Let’s say I have two machines backing up to the same storage, to different storage id each.
Does it mean I can run “duplicacy prune” from one computer but not the other one?
And if I want to prune the second machine’s backup I need to do it from the first machine?

Does not sound right. Can you please clarify?

From what i know, prune command has the argument -all which matches against all the repositories.

Therefore I think that duplicacy prune check -all will prune from all the repositories available in the selected storage, and this is why prune can be done from only 1 computer.

If you have multiple storages, then i think you need to run prune all for all different storages: duplicacy prune -all -storage <the name>.

This makes sense; I guess my question was more about whether it will result in a data loss if I try to do that from multiple machines or it is simply not advisable because it would be hard to explain which snapshots will remain and which will be deleted, especially if pruning is done with different settings on each machine?

Looking at documentation I don’t see why would it cause any data loss, and there is nothing special about first computer; it can be any. But the statement to not do it in the documentation is a bit concerning.

Actually, I kept doing exactly that for the past year or so with 0 issues – and only now saw that statement in the guide. It is not mentioned anywhere else - so perhaps outdated guide?

you mean you run prune -all from all the computers? i don’t understand what you were doing.

I think @gchen can better explain this than me?

Ah, sorry for confusion.

On machine MA I have repository RA backing up to storage S.
On machine MB I have repository RB backing up to the same storage S.

On both machines I have launchd job scheduled every hour that does the following:

duplicacy backup
duplicacy prune -keep 31:360 -keep 7:90 -keep 1:14
duplicacy prune

Question – is that OK? because it does contradict the guide. But it seems to work fine.

There’s no need for 2 prunes, just duplicacy prune -keep 31:360 -keep 7:90 -keep 1:14 is enough.

Since you are running the prune command on both machines w/o the -all flag, then this is actually intended behavior from my point of view.

I am running the same atm (my backup patterns are odd) (same = prune each repo from “itself”).

There’s no need for 2 prunes, just duplicacy prune -keep 31:360 -keep 7:90 -keep 1:14 is enough.

My understanding was that first pass would create fossils and the second path is needed to clean them up, not to wait until next invocation. Am I wrong?

Since you are running the prune command on both machines w/o the -all flag, then this is actually intended behavior from my point of view.

Yep. That’s what I thought too, and been using for past year with no issues.

But according to the guide:

Please note that if there are multiple Duplicacy instances running on different computers that back up to the same storage, only one Duplicacy instance can have the pruning option enabled.

So, are we wrong by doing it?

Actually, I suspect that the guide is outdated, but it would be great to get official opinion from gchen perhaps. :slight_smile:

The design was based on the assumption that only one instance can run the prune command. This can greatly simplify the implementation. Theoretically race conditions can happen when two instances try to operate on the same chunk at the same time, but in practice it may never happen especially if the prune command runs after the backup so they will start at random times. In addition, it is a bit wasting the resources to have a prune command working on one repository id only, since it still needs to download all backups for all repository ids in order to decide which chunks are to be deleted.

2 Likes

todotbp: wiki this

2020202020

Thank you for getting back! I’ll rethink my backup strategy and will make sure to only schedule pruning from one machine, and with -all flag. In fact, it is probably does not even need to run after every backup – e.g. once per week would be perfectly acceptable as well.

And yes, that ideally should be also in the wiki of the command line version, some people may never even get to using GUI to see that warning.