Back up to multiple storages

TheBestPessimist · 2 August 2018 06:50

There is always a danger of losing all your data if it is stored only in one storage. It is therefore highly recommended to back up your data to at least two different storage providers. Duplicacy provides a unique tool to make this backup redundancy solution super easy.

When a repository is initialized you always provide the url to the default storage, which will be initialized if it hasn’t been. ( Even though you are not told directly, this storage has the name default for easy access. )

cd /path/to/repository
duplicacy init repository_id onsite_storage_url

You can add additional storage providers to this repository by running the add command.
The first argument to add is the name of the new storage that can be used by other commands (instead of the normal default).

# add an additional storage named `offsite_storage`:

duplicacy add offsite_storage repository_id offsite_storage_url

Now when you run the backup command, by default the backup will be stored to the default storage:

duplicacy backup

Therefore, when you want to backup to the new storage, you have to specifically select it when running backup as such:

duplicacy backup -storage offsite_storage

This works, but is not the best practice for two reasons:
First, you are running the backup twice: once for the default storage and the second time for the offsite_storage storage. Thus consuming double the CPU and disk resources and taking longer times for what is a single redundant backup.
Second, if some files change between these two backup commands (eg.: you edit the name of a picture, or the rating of a song) then you would get two different backups, making the management of backups on these two storage a bit more complex.

The recommended way is to use the copy command to copy from the default storage to the additional storage (offsite_storage). This way, you’ll always get identical backups on both storage providers:

duplicacy copy -from default -to offsite_storage

Of course you may be able to use third-party tools, such as rsync or rclone, to copy the content of one storage to another ( in this case don’t forget about using --\bit-identical as explained here).
But compared with rsync/rclone, the copy command can be used to copy only a selected set of revisions instead of everything. Moreover, if two storage are set up differently (such as when one is encrypted and the other is not) then the copy command is your only choice.

It is also possible to run the copy command directly on your onsite storage server. All you need to do is to create a dummy repository there, and then initialize it with the same default storage (but as a local disk) and add the same additional storage:

# log in to your local storage server
mkdir -p \path\to\dummy\repository
cd \path\to\dummy\repository
duplicacy init repository_id onsite_storage_url
duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url
duplicacy copy -from default -to offsite_storage

This not only frees your work computer from running the copy command, but also speeds up the copy process, since now Duplicacy can read chunks from a local disk
(local_server -> router -> offsite_storage)
instead of over the network
(local server -> router -> your computer -> router -> offsite_storage).

Multiple cloud storage without a local storage

If you’re backing up to two cloud storage providers without a local storage, issuing a copy command between two cloud storage will cause all data to be downloaded from one cloud storage, and uploaded to the other.

This can be slow, and may incur extra download costs.
To avoid this - while maintaining identical backups on both storage destinations - you can add the destination storage twice, with two different snapshot ids.

One is used to issue “direct” (you can also see this snapshot id as dummy – read on) backups to the destination cloud storage, and the other is used to copy the real backups from the source storage to the destination storage.

This should work better because a “direct” (dummy) backup should hopefully have many duplicate chunks with the copied (real) backup performed later by a the copy operation (if there are files changes between the direct backup and the copy).

Since the upload of files to the second storage is done in the backup to dummy snapshot instead of in the copy to real snapshot , when the copy command is run only the (very few) chunks modified between the backups will have to be downloaded from the first storage thus significantly reducing the amount of traffic needed for download.

( this trick is based on the knowledge that most storage providers offer free upload and only the download costs money, hence you should check if this is the case for your providers as well!)

duplicacy init my-backups --storage-name backblaze b2://bucket
duplicacy add -copy backblaze --bit-identical wasabi_real_storage my-backups wasabi://bucket    # used for copying the real backups
duplicacy add -copy backblaze --bit-identical wasabi_dummy_storage my-backups_dummy wasabi://bucket      # used for direct/dummy backup
duplicacy backup -storage backblaze
duplicacy backup -storage wasabi_dummy_storage
duplicacy copy -from backblaze -to wasabi_real_storage

Pruning

It is worth mentioning that the copy command is non-destructive, so pruned data from one storage will not be automatically pruned on the copy.

Example: duplicacy copy -from onsite -to offsite

For a system running regular copy and prune operations, the following scenarios are possible:

If pruning onsite only, offsite storage will never be pruned.
If onsite pruning is equal to offsite pruning, this is perfectly fine.
If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea).
If onsite pruning is less aggressive than offsite pruning, this would work (but be inefficient to keep copying data that will be imminently pruned). If you wanted to keep the offsite storage lighter than onsite you would need to use specific revision numbers during copy.

Christoph · 12 December 2018 06:21

13 posts were split to a new topic: “Repository has not been initialized”?

Christoph · 7 November 2018 22:34

2 posts were split to a new topic: Specify two backup destinations on the command line

Christoph · 12 December 2018 06:16

2 posts were split to a new topic: How to make an existing backup copy compatible?

Christoph · 27 August 2019 05:16

A post was split to a new topic: Save to remove the “dummy” backup id?

hiavi · 25 October 2020 08:46

@TheBestPessimist - I think the command listed above

duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url

should actually be

duplicacy add -copy default -bit-identical offsite_storage repository_id offsite_storage_url

hiavi · 25 October 2020 19:00

So, if I want to make a second copy of an entire storage which is holding backups multiple repos, then I need to go do this for every repository I ever took a back up of? … feels very tedious and round about and having to maintain these dummies will becomes a pain in long term.

Why not have a command that can be pointed at a storage to do an entire storage backup to a second location? It can optionally take a list of repos one want to selectively backup into second storage, allowing for specificity if needed.

gchen · 26 October 2020 17:45

You just need to create one dummy repository. By default the copy command will copy everything from all repositories in the source storage to the destination storage, unless you specify the -r and -id options to cherry pick specific revisions.

gchen · 20 November 2020 03:38

A post was merged into an existing topic: ADD command failing - unable to figure out what am I doing wrong here?

c736ef0e6c16ef5a0746 · 8 September 2024 10:16

Does this mean that the chunks amount and the size of the storage will be the same? Somehow after I did the commands I get the following

original storage: 79,688 chunks
second storage using the copy command: 79,690 chunks

Looking at the data inside it seems all to match, but they also have 2MB difference in size (out of 400GB) so I am not sure if I should try to run the command again?

Running the check command says its all good. Doing diff between both storages, shows no difference