There is always a danger of losing all your data if it is stored only in one storage. It is therefore highly recommended to back up your data to at least two different storage providers. Duplicacy provides a unique tool to make this backup redundancy solution super easy.
When a repository is initialized you always provide the url to the default storage, which will be initialized if it hasn’t been. ( Even though you are not told directly, this storage has the name default
for easy access. )
cd /path/to/repository
duplicacy init repository_id onsite_storage_url
You can add additional storage providers to this repository by running the add command.
The first argument to add
is the name of the new storage that can be used by other commands (instead of the normal default
).
# add an additional storage named `offsite_storage`:
duplicacy add offsite_storage repository_id offsite_storage_url
Now when you run the backup command, by default the backup will be stored to the default storage:
duplicacy backup
Therefore, when you want to backup to the new storage, you have to specifically select it when running backup
as such:
duplicacy backup -storage offsite_storage
This works, but is not the best practice for two reasons:
First, you are running the backup twice: once for the default
storage and the second time for the offsite_storage
storage. Thus consuming double the CPU and disk resources and taking longer times for what is a single redundant backup.
Second, if some files change between these two backup commands (eg.: you edit the name of a picture, or the rating of a song) then you would get two different backups, making the management of backups on these two storage a bit more complex.
The recommended way is to use the copy command to copy
from the default
storage to the additional storage (offsite_storage
). This way, you’ll always get identical backups on both storage providers:
duplicacy copy -from default -to offsite_storage
Of course you may be able to use third-party tools, such as rsync or rclone, to copy the content of one storage to another ( in this case don’t forget about using --\bit-identical
as explained here).
But compared with rsync/rclone, the copy
command can be used to copy only a selected set of revisions instead of everything. Moreover, if two storage are set up differently (such as when one is encrypted and the other is not) then the copy
command is your only choice.
It is also possible to run the copy
command directly on your onsite storage server. All you need to do is to create a dummy repository there, and then initialize it with the same default storage (but as a local disk) and add the same additional storage:
# log in to your local storage server
mkdir -p \path\to\dummy\repository
cd \path\to\dummy\repository
duplicacy init repository_id onsite_storage_url
duplicacy add -copy default offsite_storage --bit-identical repository_id offsite_storage_url
duplicacy copy -from default -to offsite_storage
This not only frees your work computer from running the copy
command, but also speeds up the copy process, since now Duplicacy can read chunks from a local disk
(local_server -> router -> offsite_storage)
instead of over the network
(local server -> router -> your computer -> router -> offsite_storage)
.
Multiple cloud storage without a local storage
If you’re backing up to two cloud storage providers without a local storage, issuing a copy command between two cloud storage will cause all data to be downloaded from one cloud storage, and uploaded to the other.
This can be slow, and may incur extra download costs.
To avoid this - while maintaining identical backups on both storage destinations - you can add the destination storage twice, with two different snapshot ids.
One is used to issue “direct” (you can also see this snapshot id as dummy – read on) backups to the destination cloud storage, and the other is used to copy
the real backups from the source storage to the destination storage.
This should work better because a “direct” (dummy) backup should hopefully have many duplicate chunks with the copied (real) backup performed later by a the copy
operation (if there are files changes between the direct backup and the copy).
Since the upload of files to the second storage is done in the backup to dummy snapshot instead of in the copy to real snapshot , when the copy command is run only the (very few) chunks modified between the backups will have to be downloaded from the first storage thus significantly reducing the amount of traffic needed for download.
( this trick is based on the knowledge that most storage providers offer free upload and only the download costs money, hence you should check if this is the case for your providers as well!)
duplicacy init my-backups --storage-name backblaze b2://bucket
duplicacy add -copy backblaze --bit-identical wasabi_real_storage my-backups wasabi://bucket # used for copying the real backups
duplicacy add -copy backblaze --bit-identical wasabi_dummy_storage my-backups_dummy wasabi://bucket # used for direct/dummy backup
duplicacy backup -storage backblaze
duplicacy backup -storage wasabi_dummy_storage
duplicacy copy -from backblaze -to wasabi_real_storage
Pruning
It is worth mentioning that the copy
command is non-destructive, so pruned data from one storage will not be automatically pruned on the copy.
Example: duplicacy copy -from onsite -to offsite
For a system running regular copy and prune operations, the following scenarios are possible:
- If pruning onsite only, offsite storage will never be pruned.
- If onsite pruning is equal to offsite pruning, this is perfectly fine.
- If onsite pruning is more aggressive than offsite pruning, this would work (but is not a great idea).
- If onsite pruning is less aggressive than offsite pruning, this would work (but be inefficient to keep copying data that will be imminently pruned). If you wanted to keep the offsite storage lighter than onsite you would need to use specific revision numbers during copy.