So one way of handling multiple instances is to prevent them via a script (and it would be great if someone could share an equivalent script for windows).
But let me explore the other option: not to care about multiple instances. (I like that option, because it simplifies things quite a bit and it can also speed things up if one instance isn’t able to use all your upload bandwith.) It seems to me that what gchen says can be interpreted as “it doesn’t usually matter how many instances are running”, right?
If that is the case, we still need to take care of what the risks are (and see if those can be avoided other than preventing multiple instances). So: what are the potential problems?
To start with, let me clarify that the scenario in the OP is just one specific case of multiple instances:
two backups with same parameters from the same repo and the same storage run at the same time
Other possible cases of multiple instances (and my tentative interpretation)
- same repo, different storage: no problems whatsoever
- different repo, different storage: no problem whatsoever
- different repo, same storage: tricky. No problem whatsoever if there are no shared chunks between the two repositories. Chunks can be shared either because the repositories overlap (i.e. some folders are included in both repositories) or because identical files (or file-parts?) happen to exist in both repositories. The case of overlap can be treated as identical with the OP scenario (i.e. same repo, same storage) because we definitely know that chunks are shared. I’m not sure about the the case of haphazardly shared chunks, but to be on the safe side, let’s also treat it as identical with “same repo same storage”.
If the above is correct, we can note that multiple instances of duplicacy are only a matter of concern if they are backing up to the same storage. (BTW: what about duplicacy instances running with something else that the backup
command? I will leave that aside for the time being).
So now, what are the risks? gchen says:
some chunks referenced by the earlier backup whose snapshot file gets overwritten will become unreferenced. However, if files do not change between these two backups, there won’t by any such chunks.
So there is another huge scenario, for which we can say multiple instances are no problem whatsoever: when none of the files (or file parts?) shared by the two backup jobs changes while those instances are running.
Now, what if they do? Do we lose data? Well, the data is there (the chunks have been uploaded), but they are “invisible” to any restore process, because they are not referenced by any snapshot.
That leaves us with two questions:
- Can the data from the unreferenced chunks somehow be restored, assuming that the files in the repository have been destroyed for ever by a nuclear disaster?
- Under what circumstances will those unreferenced chunks disappear?
I cannot answer question 1, but I think the answer to question 2 is: only when duplicacy prune -exhaustive
is run. If that is correct and if there is a yes answer to question 1, then I am tempted to conclude that we can safely not care about running multiple instances of duplicacy, provided that we use duplicacy prune -exhaustive
carefully, i.e. only when we know (but how to know?) that it will not delete potentially needed orphaned chunks.
Phew! - Forgive me for thinking out aloud at length, I thought it is the best way to clarify these questions and to identify and correct mistakes I might have made in the above.