How to backup to multiple storages needs updating

aweber · 6 November 2018 19:09

From what I can tell, the add command requires a storage name, snapshot id, and the target URL. The existing post on “Back up to multiple storages” (which is linked on the User Guide page) does not show using a snapshot-id.

I’m also not clear why a snapshot-id is required when making a COPY and bit-identical additional storage. What is the snapshot-id used for in this case?

Also, reading through the different pages, the concept of using rsync/rclone seems to indicate that if you add the additional storage that way, you can then ONLY rsync the “chunks” folder to keep it consistent? Is that accurate?

mathome1 · 6 November 2018 22:46

I am no expert at this, as I am also new to Duplicacy and in the trialing phases and working through the issues.

But I think a few answers to your questions for you to consider :-

The “snapshot id” in the add command and the “repository_id” in the Back up to multiple storages page I assume you are referencing are the same thing. It might be a little less confusing for newbie’s if the terms in the documentation where consistent to avoid confusion.
The “snapshot id” or “repository_id” is needed to uniquely identify the particular repository that is being backed up at the remote storage. It is needed because not only is it possible to backup a repository to multiple storages (eg a local storage and a offsite storage), it is also possible to backup multiple repositories to a single storage location. And in fact this is sensible to do if for example you want to get the benefits of deduplication across multiple repositories, or in fact multiple different repositories on different computers. Ie if you are backing up computer “x” and repository “a” AND computer “y” and repository “b”, to the same storage, you need the “snapshot id” or “repository_id” to uniquely identify the separate backups.
Maybe I might have misunderstood your “not clear why a snapshot-id is required” part of the question. If you already understood the points above, maybe you are asking why a copy command needs to get given a “snapshot id” or “repository_id” when in fact this is already defined in the configuration you are copying. And this seems a reasonable question, and for a lot of people I assume there will be no reason to redefine this, and it could simply be just copied. However allowing it to be defined enables name conflicts to be addressed and significant flexibility in the way backups can be done. For example, if you read the “Multiple cloud storage without a local storage” example in the link above, they give one example where you need to have different “snapshot-ids” (or “repository_id” depending on your terminology) for what is in fact the exact same repository backup on the same storage. Another example would be to allow for say someone in an organisation starts backing up to local storage and calls the “snapshot id” “documents” and this is unique enough locally. But later that want to back up to a share cloud storage, shared by lots of computers. This name would no longer be unique enough, or in this case it is handy to be able to define it again.

I suspect the cause of your confusion might be the same issue that caused me a little confusion. I personally think the term “snapshot id” is a little confusing. I think of a snapshot as a point in time, and thus when it is written in Duplicacy documentation it is 1 particular version of a backup. But in fact that is known as a “revision”. So my suggestion for the developer (in my limited experience with Duplicacy and if I have understood it correctly) is to consider getting rid of the term “snapshot id” and replace/standardise on “repository_id” in all the documentation and help files to avoid this potential confusion???

gchen · 7 November 2018 01:48

@mathome1’s explanation is correct, and the “Multiple cloud storage without a local storage” example was also my first thought. Just want to add that the snapshot-id isn’t used when you only copy from another storage to this new storage.

No. If you add the storage using -copy and -bit-identical, you can use rsync to copy the chunks folder, in addition to the copy command. In other words, the copy command always works, but rsync doesn’t.

Christoph · 7 November 2018 06:40

Yes, you are not alone with this. The problem is that changing the terminology may well cause even more confusion…

aweber · 7 November 2018 13:01

Is there a detailed explanation of the storage format and directories? It does not seem logical that if I were to replicate the exact storage from one target to another, that it would possibly not work. (I was surprised when I read to only copy the "chunks"folder in any case.)

To simplify it with an example: If I make two SMB mounts identical by copying all the files and directories from one to the other…including file properties, how can it be that only the original mount will be assured to work, but the copy is not assured to work??? They are identical.

Also: It seems like a relatively common scenario where backups would go to an on-site location and then be “replicated” to cloud storage. There is a kludgy way to do this by making a “dummy repository”. If the COPY command is the only reliable way to make two stores identical (yes, you can copy only subsets of revisions), can I suggest a FR to make a “SYNC” command that just simplifies that configuration? Then maybe there’s an easy “ADD -MIRROR” you could run on your local repositories?

To the original question, there are a lot of “moving parts” in these configurations. Standardizing on terminology would be a huge gain in users comprehending how to configure and how it works. (And I agree with @mathome1 that “snapshot_id” is an awful synonym for “repository_id”, as traditional backup terminology uses “snapshot” generally where you are using “revision”.)

Documenting that “during a copy, repository_id is not used, because it is copied along with the source’s info” would be very helpful if that’s the case. When the -COPY option is specified, it would be even better to NOT require the parameter at all.

gchen · 7 November 2018 16:29

Of course if you rsync the entire storage directory including the config file to a different place it should just work.

When both -copy and -bit-identical, you’re basically re-encrypt the config file with a different master password. All chunks will be encrypted using the same set of encryption passwords (randomly generated and stored in the config file) so they can be rsync’ed. This is useful when want to use different passwords for the local and cloud storages.

With only -copy, a new config file with a new set of randomly generated passwords will be created, so only the copy command will work.

aweber · 7 November 2018 17:17

But the “trick” here is to add the necessary reference info to the local repository so that you can restore from a different storage (but with all the same remaining parameters/settings).

I think what would be needed in that case is an “ADD” that just copies the current config (whether that’s default or another, named ID), but allows to change the storage URI. I believe that would then be all the client repository needs to restore from either the “on premise storage” or the “cloud storage” – or really any set of synchronized storages for the same repository id, right???