Back up to multiple storages

multi-storage

#1

There is always a danger of losing all your data if it is stored only in one storage. It is therefore highly recommended to back up your data to at least two different storage providers. Duplicacy provides a unique tool to make this backup redundancy solution super easy.

When a repository is initialized you always provide the url to the default storage, which will be initialized if it hasn’t been. (:bulb: Even though you are not told directly, this storage has the name default for easy access. )

cd /path/to/repository
duplicacy init repository_id onsite_storage_url

You can add additional storage providers to this repository by running the add command.
The first argument to add is the name of the new storage that can be used by other commands (instead of the normal default).

# add an additional storage named `offsite_storage`:

duplicacy add offsite_storage repository_id offsite_storage_url

Now when you run the backup command, by default the backup will be stored to the default storage:

duplicacy backup

Therefore, when you want to backup to the new storage, you have to specifically select it when running backup as such:

duplicacy backup -storage offsite_storage

This works, but is not the best practice for two reasons:
First, you are running the backup twice: once for the default storage and the second time for the offsite_storage storage. Thus consuming double the CPU and disk resources and taking longer times for what is a single redundant backup.
Second, if some files change between these two backup commands (eg.: you edit the name of a picture, or the rating of a song) then you would get two different backups, making the management of backups on these two storage a bit more complex.


The recommended way is to use the copy command to copy from the default storage to the additional storage (offsite_storage). This way, you’ll always get identical backups on both storage providers:

duplicacy copy -from default -to offsite_storage

Of course you may be able to use third-party tools, such as rsync or rclone, to copy the content of one storage to another (:grey_exclamation: in this case don’t forget about using --\bit-identical as explained here).
But compared with rsync/rclone, the copy command can be used to copy only a selected set of revisions instead of everything. Moreover, if two storage are set up differently (such as when one is encrypted and the other is not) then the copy command is your only choice.


It is also possible to run the copy command directly on your onsite storage server. All you need to do is to create a dummy repository there, and then initialize it with the same default storage (but as a local disk) and add the same additional storage:

# log in to your local storage server
mkdir -p \path\to\dummy\repository
cd \path\to\dummy\repository
duplicacy init repository_id onsite_storage_url
duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url
duplicacy copy -from default -to offsite_storage

This not only frees your work computer from running the copy command, but also speeds up the copy process, since now Duplicacy can read chunks from a local disk
(local_server -> router -> offsite_storage)
instead of over the network
(local server -> router -> your computer -> router -> offsite_storage).

Multiple cloud storage without a local storage

If you’re backing up to two cloud storage providers without a local storage, issuing a copy command between two cloud storage will cause all data to be downloaded from one cloud storage, and uploaded to the other.

This can be slow, and may incur extra download costs.
To avoid this - while maintaining identical backups on both storage destinations - you can add the destination storage twice, with two different snapshot ids.

One is used to issue “direct” (you can also see this snapshot id as dummy – read on) backups to the destination cloud storage, and the other is used to copy the real backups from the source storage to the destination storage.

This should work better because a “direct” (dummy) backup should hopefully have many duplicate chunks with the copied (real) backup performed later by a the copy operation (if there are files changes between the direct backup and the copy).

Since the upload of files to the second storage is done in the backup to dummy snapshot instead of in the copy to real snapshot , when the copy command is run only the (very few) chunks modified between the backups will have to be downloaded from the first storage thus significantly reducing the amount of traffic needed for download.

(:bulb: this trick is based on the knowledge that most storage providers offer free upload and only the download costs money, hence you should check if this is the case for your providers as well!)

duplicacy init my-backups --storage-name backblaze b2://bucket
duplicacy add -copy backblaze --bit-identical wasabi_real_storage my-backups wasabi://bucket    # used for copying the real backups
duplicacy add -copy backblaze --bit-identical wasabi_dummy_storage my-backups_dummy wasabi://bucket      # used for direct/dummy backup
duplicacy backup -storage backblaze
duplicacy backup -storage wasabi_dummy_storage
duplicacy copy -from backblaze -to wasabi_real_storage

Pruning

It is worth mentioning that the copy command is non-destructive, so pruned data from one storage will not be automatically pruned on the copy.

Example: duplicacy copy -from onsite -to offsite

For a system running regular copy and prune operations, the following scenarios are possible:

  • If pruning onsite only, offsite storage will never be pruned.
  • If onsite pruning is equal to offsite pruning, this is perfectly fine.
  • If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea).
  • If onsite pruning is less aggressive than offsite pruning, this would work (but be inefficient to keep copying data that will be imminently pruned). If you wanted to keep the offsite storage lighter than onsite you would need to use specific revision numbers during copy.

Multiple remotes in GUI: one for LAN, one through Internet, both = same result
Move from sftp to webdav
Backing Up Large Datasets to Both Local and Backblaze B2 Destinations Using Duplicacy CLI on Linux
Duplicacy User Guide
How to backup to multiple storages needs updating
Restore question with multiple storages
Restore from second Storage (copied)
Moving from Onedrive to Backblaze B2
“Repository has not been initialized”?
split this topic #2

13 posts were split to a new topic: “Repository has not been initialized”?


split this topic #16

2 posts were split to a new topic: Specify two backup destinations on the command line


split this topic #17

2 posts were split to a new topic: How to make an existing backup copy compatible?