Back up to multiple storages

multi-storage

#1

There is always a danger of losing all your data if it is stored only in one storage. It is therefore highly recommended to back up your data to at least two different storage providers. Duplicacy provides a unique tool to make this backup redundancy solution super easy.

When a repository is initialized you always provide the url to the default storage, which will be initialized if it hasn’t been. (:bulb: Even though you are not told directly, this storage has the name default for easy access. )

cd /path/to/repository
duplicacy init repository_id onsite_storage_url

You can add additional storage providers to this repository by running the add command.
The first argument to add is the name of the new storage that can be used by other commands (instead of the normal default).

# add an additional storage named `offsite_storage`:

duplicacy add offsite_storage repository_id offsite_storage_url

Now when you run the backup command, by default the backup will be stored to the default storage:

duplicacy backup

Therefore, when you want to backup to the new storage, you have to specifically select it when running backup as such:

duplicacy backup -storage offsite_storage

This works, but is not the best practice for two reasons:
First, you are running the backup twice: once for the default storage and the second time for the offsite_storage storage. Thus consuming double the CPU and disk resources and taking longer times for what is a single redundant backup.
Second, if some files change between these two backup commands (eg.: you edit the name of a picture, or the rating of a song) then you would get two different backups, making the management of backups on these two storage a bit more complex.


The recommended way is to use the copy command to copy from the default storage to the additional storage (offsite_storage). This way, you’ll always get identical backups on both storage providers:

duplicacy copy -from default -to offsite_storage

Of course you may be able to use third-party tools, such as rsync or rclone, to copy the content of one storage to another (:grey_exclamation: in this case don’t forget about using --\bit-identical as explained here).
But compared with rsync/rclone, the copy command can be used to copy only a selected set of revisions instead of everything. Moreover, if two storage are set up differently (such as when one is encrypted and the other is not) then the copy command is your only choice.


It is also possible to run the copy command directly on your onsite storage server. All you need to do is to create a dummy repository there, and then initialize it with the same default storage (but as a local disk) and add the same additional storage:

# log in to your local storage server
mkdir -p \path\to\dummy\repository
cd \path\to\dummy\repository
duplicacy init repository_id onsite_storage_url
duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url
duplicacy copy -from default -to offsite_storage

This not only frees your work computer from running the copy command, but also speeds up the copy process, since now Duplicacy can read chunks from a local disk
(local_server -> router -> offsite_storage)
instead of over the network
(local server -> router -> your computer -> router -> offsite_storage).

Multiple cloud storage without a local storage

If you’re backing up to two cloud storage providers without a local storage, issuing a copy command between two cloud storage will cause all data to be downloaded from one cloud storage, and uploaded to the other.

This can be slow, and may incur extra download costs.
To avoid this - while maintaining identical backups on both storage destinations - you can add the destination storage twice, with two different snapshot ids.

One is used to issue “direct” (you can also see this snapshot id as dummy – read on) backups to the destination cloud storage, and the other is used to copy the real backups from the source storage to the destination storage.

This should work better because a “direct” (dummy) backup should hopefully have many duplicate chunks with the copied (real) backup performed later by a the copy operation (if there are files changes between the direct backup and the copy).

Since the upload of files to the second storage is done in the backup to dummy snapshot instead of in the copy to real snapshot , when the copy command is run only the (very few) chunks modified between the backups will have to be downloaded from the first storage thus significantly reducing the amount of traffic needed for download.

(:bulb: this trick is based on the knowledge that most storage providers offer free upload and only the download costs money, hence you should check if this is the case for your providers as well!)

duplicacy init my-backups --storage-name backblaze b2://bucket
duplicacy add -copy backblaze --bit-identical wasabi_real_storage my-backups wasabi://bucket    # used for copying the real backups
duplicacy add -copy backblaze --bit-identical wasabi_dummy_storage my-backups_dummy wasabi://bucket      # used for direct/dummy backup
duplicacy backup -storage backblaze
duplicacy backup -storage wasabi_dummy_storage
duplicacy copy -from backblaze -to wasabi_real_storage

Pruning

It is worth mentioning that the copy command is non-destructive, so pruned data from one storage will not be automatically pruned on the copy.

Example: duplicacy copy -from onsite -to offsite

For a system running regular copy and prune operations, the following scenarios are possible:

  • If pruning onsite only, offsite storage will never be pruned.
  • If onsite pruning is equal to offsite pruning, this is perfectly fine.
  • If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea).
  • If onsite pruning is less aggressive than offsite pruning, this would work (but be inefficient to keep copying data that will be imminently pruned). If you wanted to keep the offsite storage lighter than onsite you would need to use specific revision numbers during copy.

Multiple remotes in GUI: one for LAN, one through Internet, both = same result
Duplicacy User Guide
Move from sftp to webdav
Restore question with multiple storages
Backing Up Large Datasets to Both Local and Backblaze B2 Destinations Using Duplicacy CLI on Linux
How to backup to multiple storages needs updating
Restore from second Storage (copied)
#2

I’m not sure that I correctly understand what you explained in the scenario “Multiple cloud storages without a local storage”:

I started from scratch with only existing directories on the remote sftp-storages.
The option ‘-copy’ on second line of the code example ensures that the two storages are in the same configuration state and nothing else, right? Giving this command I got the message “Repository has not been initialized” ???
But it seems to make no sense to backup to storage"wasabi-direct" (5. line) and then make a copy from backblaze to wasabi.

Further explanations would be great
Thank you


#3

This is because the local directory to be backed up (called the repository) has not been initialized. There was likely an error when you ran the first command (duplicacy init my-backups --storage-name backblaze b2://bucket)

If you run the copy command without backing up to wasabi-direct first, most chunks will need to be downloaded from backblaze and then uploaded to wasabi. By backing up to wasabi-direct first, you’ll save a lot of backblaze egress fee.


#4

I have edited the #how-to in order to try and explain things a bit better. Does this explanation look simpler to you, @ralf?


#5

Yes, this is more clear. In the last line of the example code you wrote " … -to wasabi_storage". Am I right that this should be “… -to wasabi_real_storage” ?

Thank you


#7

Thank for your quick reply.
I will start a new try, because I didn’t notice any errors while using the code.


#8

" (in this case don’t forget about using --bit-identical as explained here)."

That link has no explanation about “–bit-identical”. Does --bit-identical have to be specified when adding a second storage or only when running the copy command? Or at both stages?


#9

I think @TheBestPessimist wanted to point to the add command showing the -copy and -bit-identical options, this link.

Only when adding / creating a new storage with the add command. The -bit-identical option will make it compatible with the initial storage created.


#10

Just to clarify -bit-identical isn’t required to make it compatible with the original storage. That’s what -copy does. -bit-identical was added primarily so you can use third-party tools like rsync to seed or maintain the copy; to force the chunk filenames and the hashes they’re encrypted with to be the same as the ones stored in the encrypted config file.

Without it, a copy compatible storage (made with -copy) will be created and corresponding chunks of the same size will be encrypted with a different set of hashes and end up with different filenames. You can’t use rsync but Duplicacy copy will work just fine.


#11

You explained that splendidly! :ok_hand::+1:


#12

Re. copying from the onsite storage:

duplicacy init repository_id onsite_storage_url

Does this also work if the onsite_storage has been previously initialized with encryption?


#13

Yes: when you try to init or add a storage which is already initialised, then duplicacy fill figure that out and use the existing storage settings.


#14

Hmmm, I must be doing something wrong then…

From my laptop I did

duplicacy init -e -storage-name local-server Downloads sftp://user@localserver:port//mnt/data/backup/laptop

I then did

duplicacy backup -stats

which ran succesfully

I logged in to the local server and did

mkdir -p /mnt/data/backup/dummy/laptop
cd /mnt/data/backup/dummy/laptop
duplicacy init LaptopDummy /mnt/data/backup/laptop/

This resulted in the following error

Failed to download the configuration file from the storage: The storage is likely to have been initialized with a password before

duplicacy init LaptopDummy //mnt/data/backup/laptop/
results in the same error.

I can’t figure out what I’m doing wrong. Any help is greatly appreciated.


#15

On the server run init with encryption:

duplicacy init -e LaptopDummy /mnt/data/backup/laptop/


#16

2 posts were split to a new topic: Specify two backup destinations on the command line