Backing Up Large Datasets to Both Local and Backblaze B2 Destinations Using Duplicacy CLI on Linux

backup

#1

Backblaze B2 has become a popular cost-effective online storage mechanism, and is typically less expensive than competing services such as Amazon S3.

This wiki describes how to take a large amount of data and back it up to both a local backup, such as an external hard drive or NAS, and a Backblaze B2 account using the CLI on Linux.

The first step is to create a Backblaze account, and sign up for B2 storage. You will receive a B2 Account ID and a B2 Application Key from Backblaze. Next, from Backblaze’s website, create a B2 bucket. The name for the bucket must be unique across all buckets by all Backblaze B2 users. Substitute this bucket name for the placeholder MY-B2-BUCKET-NAME in this wiki. (Backblaze B2 buckets only allow alphanumeric characters and hyphens.)

And last, if you’re going to be backing up more than the free amount of B2 storage (10 GB as of this writing), then on Backblaze’s website you will need to go to B2 Cloud Storage -> Caps and Alerts, and adjust the maximum daily storage cap. As of this writing (2018-Jun) the pricing for Backblaze B2 storage is $0.005/GB/month. Assuming 30 days/month, this amounts to $0.1667/TB/day, so a cap of $2 per day in storage cost allows for up to 12 TB.

The current download costs are $0.01/GB past 1 GB per day, so the download caps may also need to be adjusted when restoring data, for instance.

Identify the directory to be backed up:

[root@mycomputer ~]# cd /path/to/my/data
[root@mycomputer data]# pwd
/path/to/my/data

Let’s see how much data is to be backed up:

[root@mycomputer data]# du -s -h
3.4T

(Your results will reflect the amount of data contained under your current directory, which is likely to be different than this sample amount.)

Setup local storage and repository

The first step will be to initialize the duplicacy backups at the directory to be backed up (repository in duplicacy terminology).
The “-e” option indicates that this data will be encrypted with a password, so enter (and re-enter to confirm) the desired encryption password when prompted by the duplicacy init command. Assuming a destination for the backed up data is “/path/to/local/backup/destination”, you would issue the following command:

[root@mycomputer data]# duplicacy init -e data_backup /path/to/local/backup/location
Enter storage password for /path/to/local/backup/location /:*********************************
Re-enter storage password:*********************************
/path/to/my/data will be backed up to /path/to/local/backup/location with id data_backup
And the following preferences file will be created
[root@mycomputer data]# cat .duplicacy/preferences

[
    {
        "name": "default",
        "id": "data_backup",
        "storage": "/path/to/local/backup/location/",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false
    }
]

Now change the name default to something more descriptive. This will be the locally-connected backup (the external hard drive/NAS), so rename it to describe exactly what this storage is:

[root@mycomputer data]# perl -pi.bak -e 's/default/my_external_hard_drive_backup/' .duplicacy/preferences

The preceding command uses a Perl one-liner to substitute text in-place in a file. The command also creates a backup file, .duplicacy/preferences.bak, that contains the original preferences file (in case something goes wrong here).

Here's the preferences file after the name change
[root@mycomputer data]# cat .duplicacy/preferences

[
    {
        "name": "my_external_hard_drive_backup",
        "id": "data_backup",
        "storage": "/path/to/local/backup/location/",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false
    }
]

Setup Backblaze storage and repository

Now add a second storage location for the Backblaze B2 bucket:

[root@mycomputer data]# duplicacy add -e backblaze_b2_data data_backup b2://MY-B2-BUCKET-NAME

(Consider also adding the -copy my_external_hard_drive_backup --bit-identical options to make the B2 and local backups copy-compatible.)

Now enter the B2 account ID, application key, and encryption password when prompted:

Enter Backblaze Account ID:MY_BACKBLAZE_ACCOUNT_ID
Enter Backblaze Application key:MY_BACKBLAZE_APPLICATION_KEY
Enter storage password for b2://MY-B2-BUCKET-NAME:*********************************
Re-enter storage password:*********************************
`/path/to/my/data will be backed up to b2://MY-B2-BUCKET-NAME with id data_backup`
Here's the preferences file after adding backblaze
[root@mycomputer data]# cat .duplicacy/preferences

   [
    {
        "name": "my_external_hard_drive_backup",
        "id": "data_backup",
        "storage": /path/to/local/backup/location/",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false
    },
 {
        "name": "backblaze_b2_data",
        "id": "data_backup",
        "storage": "b2://MY-B2-BUCKET-NAME",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false
        }
]

Now let’s load the Backblaze account ID, application key, and encryption password for the B2 storage into the preferences file to enable set-and-forget backups:

[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key b2_id -value MY_BACKBLAZE_ACCOUNT_ID
New options for storage b2://MY-B2-BUCKET-NAME have been saved
[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key b2_key -value MY_BACKBLAZE_APPLICATION_KEY
New options for storage b2://MY-B2-BUCKET-NAME have been saved
[root@mycomputer data]# duplicacy set -storage backblaze_b2_data -key password -value "MY_ENCRYPTION_PASSWORD"
New options for storage b2://MY-B2-BUCKET-NAME have been saved
Here's the preferences file with all the backblaze config
[root@mycomputer data]# cat .duplicacy/preferences

   [
    {
        "name": "my_external_hard_drive_backup",
        "id": "data_backup",
        "storage": "/path/to/local/backup/location/",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false
    },
 {
        "name": "backblaze_b2_data",
        "id": "data_backup",
        "storage": "b2://MY-B2-BUCKET-NAME",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false,
        "keys": {
            "b2_id": "MY_BACKBLAZE_ACCOUNT_ID",
            "b2_key": "MY_BACKBLAZE_APPLICATION_KEY",
            "password": "MY_ENCRYPTION_PASSWORD"
        }
        }
]

The preferences file now has enough information stored in it to be able to backup to a B2 account and not require user interaction.

Since the preferences file has passwords, let’s lock down the access to it:

[root@mycomputer data]# chmod -R 600 .duplicacy/

Running the local backup

Now let’s start the local backup first.

Depending on the amount of data to be backed up, the computer’s processing power, and the data connection speed to the external hard drive, it may take some time (possibly on the order of several hours) to complete.

[root@mycomputer data]# duplicacy backup -threads 2 -stats -storage my_external_hard_drive_backup

Enter the encryption password when prompted.

Storage set to /path/to/local/backup/location/
No previous backup found
Indexing /path/to/my/data
 […]
 [lots of “packed” lines…]
 […]
Backup for /path/to/my/data at revision 1 completed

You can experiment with the number of threads used for this task (we assumed 2 here) to minimize the time required for the backup. Be careful that on a HDD many threads will damage your speed instead of increasing it!

Running the backblaze backup

Now that we have a local backup using duplicacy, it’s time to create the online backup to the B2 bucket. Depending on the amount of data being backed up and the upload speed, the duration required for this might be measured in weeks or even months. The following command has been tested using bash and might not apply to all possible shells:

[root@mycomputer data]# nohup duplicacy backup -threads 10 -stats -storage backblaze_b2_data > /path/to/logfile/for/this/backup 2>&1  &
[1] 27719
[root@mycomputer data]#

Let’s dissect this last command:

  • nohup: Continue running this process even after the user logs out, as long as the computer is still running. This is helpful for backing up very large datasets, particularly with slower upload speeds.

  • duplicacy backup -threads 10 -stats -storage backblaze_b2_data > /path/to/logfile/for/this/backup: Initiate a backup using 10 threads, show stats at the end, and backup to the B2 bucket that was set up previously.
    Send the outputs to the path to the logfile indicated.

  • 2>&1: Send outputs to stderr to stdout, so anything sent to stderr gets redirected to the logfile as well

  • &: Start this as a background process

As mentioned above: you can experiment with the number of threads used for this task (we assumed 10 here) to minimize the time required for the backup. 10 is chosen here (instead of 2) because the slowest part in this backup is the network transfer to Backblaze. This is much much slower than copying to your local HDD.

Monitor backblaze download

You can periodically monitor the backup by looking at the current logfile created:

[root@mycomputer data]# tail /path/to/logfile/for/this/backup
Uploaded chunk 4062 size 14940431, 1.24MB/s 21 days 16:40:28 0.8%
Uploaded chunk 4067 size 2498829, 1.24MB/s 21 days 16:46:32 0.8%
Uploaded chunk 4064 size 3486015, 1.24MB/s 21 days 16:47:06 0.8%
Uploaded chunk 4068 size 2619000, 1.24MB/s 21 days 16:45:05 0.8%
Uploaded chunk 4069 size 3637340, 1.24MB/s 21 days 16:55:20 0.8%
Uploaded chunk 4066 size 12429227, 1.24MB/s 21 days 16:54:05 0.8%
Uploaded chunk 4070 size 9584627, 1.24MB/s 21 days 16:41:25 0.8%
Uploaded chunk 4073 size 2962527, 1.24MB/s 21 days 16:48:43 0.8%
Uploaded chunk 4072 size 5215431, 1.24MB/s 21 days 16:48:39 0.8%
Uploaded chunk 4071 size 6534516, 1.24MB/s 21 days 16:48:33 0.8%

(This is sample data – your log should look similar, but obviously not the same as this.)

Use copy instead of 2 backup commands

This guide teaches you how to setup your backup strategy but has one possible issue: the files between the 2 backups may be modified inbetween the backup runs.

This may be totally fine if you are not concerned about each file beeing exactly the same in both storage. (since we all expect to never need to restore a backup :slight_smile: ).

In the case you do need the backups identical, then you will have to adapt the instructions above just a little bit:

  • use the add -copy my_external_hard_drive_backup --bit-identical command
  • use the copy command instead of the second backblaze backup

These 2 steps are better explained in the following #how-to : Back up to multiple storage


#2

Thank you for posting this how-to! For future first-timers like me, I thought I would clarify that the way the B2 UI is currently set up, you need to request a new “master application key” using the blue button in the screenshot to use as the application key in the CLI. The regular application key would not work for me.


#3

You are correct.
Duplicacy 2.1.1 doesn’t support application key, however the next version of duplicacy will: Backblaze B2 Application Keys.


#4

FreeNAS has a cloud sync plug-in (which I believe is an rclone package). Could I use Duplicacy CLI to backup my local PCs to one, consistent destination in the FreeNAS box, and then just use the rclone to “dumb replicate” that to B2?

There are a few catches to this, like running the rclone at a “consistent” time - not during any Duplicacy backup. But this would save having to add the add’l repository config and not only run a “copy”, but also the necessary prune commands? Also ease of setup on the FreeNAS box.

Would I not be able to restore directly from B2 in this case if the FreeNAS box died?

Thanks!


#5

I am not sure how that dumb replicate works on your nas.

In theory it could work as long as the folders are perfectly in sync. Because for example if you prune on your nas, but rclone doesn’t prune the same chunks, only adds the new ones, then you will keep filling more and more space in b2 (which will cost more and more money).

You have to give us a little more details about your idea, but right now from what i understand using duplicacy to do the copy and prune would seem the sensible option.


#6

But I guess the short answer is: use duplicacy copy for this unless you know exactly what you’re doing, right?