Migrate from single offsite storage to onsite SFTP with copy to offsite storage

I currently have 3 machines backing up to b2 directly, and I would like to alter the setup so that those machines back up to an onsite SFTP server instead, which then copies the backups to B2 so that I can have identical and efficient onsite and offsite backups.

I’m thinking it might be possible to create a new local storage on the SFTP server and copy down from B2, but then will the client machines be able to back up directly to it? Or will they need to run the copy commands themselves? Ideally I’d like the SFTP server to run the copy operations rather than the clients, but I’m not sure if that’ll be possible based on other forum posts I’ve read.

It would be helpful if someone could give me a step-by-step/ELI5 explanation.

Thanks!

Sure!

Initialize a storage on your server compatible with B2 storage:

duplicacy add  -copy <B2 storage name> <your server new storage name> <snapshot id> <storage url>

Copy the contents from your B2 storage to your new local storage:

duplicacy copy -from <B2 storage name> -to <your server new storage name>

And finally configure your computer’s backup jobs to point to your local server. The server itself can execute the copy command to B2, just properly evaluate the times and durations of the jobs, to ensure that the copy only occurs after the backups are finished.

For all the commands above, see the guide for additional options. It is interesting to use an adequate number of threads, for example.


To copy from B2 without egress costs I suggest using the integration with cloudflare, it works very well:

2 Likes

Thanks! I am confused about the first add command though since I don’t believe there are any storages set up on the sftp server to add to (I’ve never ran init there to initialize a storage, only on other clients to back up to the sftp server as a destination). Or are the client storage configurations somehow stored in the backup directories (snapshots/ etc.)?

The add -copy is basically the same as an init but copies the encryption and chunk details from the B2s config file to the new, local, URL - resulting in an initialised local storage, which is copy-compatible.

Though if I may suggest an alternative procedure…

After you do the add -copy, you could, in fact, then immediately point your backups to the new URL and use different backup IDs for each repository. Instead of expending bandwidth and money downloading chunks from B2, you’re basically pre-populating the local storage with mostly the same de-duplicated (deterministic) chunks that would be present if you did a copy.

And after you’ve pre-populated the local storage with initial backups, you could choose to copy the rest of the B2 historic snapshot to local, or just not bother.

Either way, at this stage, you can proceed to set up a script on your local server to copy from local to B2 and you won’t have to re-upload most chunks.

2 Likes

Perfect, I didn’t know duplicacy kept the storage config on the server itself, so that clarifies some things.

I’m pretty sure I understand your approach, I’ll give it a shot and post back it I have any further questions. Thanks!

edit: actually, I just realized I’m not clear on how to point the client backups to the new copyable storage assuming I initialized it on the sftp server itself and not the clients. So assuming I run duplicacy add -copy on the server, where exactly do I need to point the clients to back up specifically to that repository?

Then on the clients, add a new storage in the Storage tab (green plus button at the bottom of the page), using the new URL of the sftp.

It’s already initialised so you’re just adding it to the client configuration.

You also have a couple options here…

Adding a new storage with a new name means you’ll have to remove the old backup jobs and add them again, pointing to the new storage.

However, I believe you can shortcut this by removing the old storage and adding the new storage URL using the exact same storage name as the removed one. That way, your old backup jobs and schedules won’t have to be re-added.

1 Like

So point the clients to the local directory where I ran duplicacy add -copy (and contains .duplicacy/preferences etc.)? If thats the case then that’s another thing I had no idea worked that way

No no… when you run add -copy, you had to supply the URL for the root of the new storage. Perhaps you did that locally on the storage server, but your clients need to be able to access that using a URL. If your plan is to provide that access via sftp, you need to set up the ssh/sftp to point to that storage location, via an sftp URL - e.g. sftp://user@server.local:2222//duplicacy.

Sorry yeah, I think that’s what I meant. If I initialized the storage at /mnt/backups, I could expose that via sftp and point the clients at sftp://<server>/mnt/backups for example.

I just didn’t realize you could backup TO a directory you ran add -copy in, I was thinking it acted as the source instead and could only backup outwards from there

Yup!

(The Web GUI makes it easy to construct the URL by filling in individual text fields, but you’ll notice your final URL on the storage page probably has a double-slash // prefix in the path and in user@server format.)

Hmm you’re not really backing up TO the location where you ran those commands from…

With your scenario, you want to use the CLI to copy from local storage (remember: not the same location you’re running the commands from) to B2. That location where you’re running Duplicacy from, would normally be the repository / backup root, but you’re not really backing up anything. This location is just an empty, dummy, location, from where you did the add -copy and can later issue copy, maybe even prune, check.

It just occured to me you may have run add -copy from /mnt/backups - in which case, I guess that would work. :slight_smile: Though I’d strongly suggest keeping the storage separate to your dummy repository - I use something like ~/dummy. Not least because the .duplicacy subdirectory contains sensitive information about your B2 storage that you may not want your client machines to have access to! (Thinking ransomware rather than malicious users.)

1 Like

OK, makes sense! One final thought I had…

Would your trick to seed the data with an initial backup and then copy down revisions work from the clients instead? I.e, I add a copy storage on the sftp server from each client, run an initial backup to it, and then use copy to grab only the revisions from b2? I realized it could actually be better to do the copying from the clients because they use duplicacy-util which would be aware if a backup job was running and thus wouldn’t start the copy to b2 until it finished. Is that making any sense?

I really appreciate all your help by the way.

You can do the copy from any client with access to both storages, but the entire process is more efficient if you let a single Duplicacy instance do it all since, with the -all flag, you can do all repositories and all revisions in one go.

Also it doesn’t matter if your clients are in the middle of backups (though I’d suggest scheduling the server to do a copy out of hours). But you can do a copy and backups at the same time - it’ll process only the snapshot revisions that exist in the source storage at run time.

Can you do a copy while other clients are pruning? Just want to make sure because currently each client does a backup->check->prune as part of their nightly jobs

Probably not recommended, no - though I don’t think it would result in irreversible failure, just a failed copy job, which it can retry later.

But I personally would only do one prune operation amongst the clients anyway, and I would offload that job to your local server, including for check.

Got it. Final question:

You said to use different backup IDs for each snapshot when doing the initial backup to sftp. Did you mean just different between each of the clients (how it is now) or different than what’s in b2 currently? It seems like if I used a different ID than what’s in B2 then it wouldn’t be possible to copy down the revision history later.

Different to B2 because your backups on your local storage will start at revision 1 and then you’ll have a conflict if that revision exists on B2 (and even if it doesn’t, there won’t be a clean, sequential, snapshot history when you copy revisions between the two storages.

Being different IDs doesn’t stop you from copying down the old snapshots - for restore purposes. Anyway, prune will remove them eventually over time anyway.

You could use the same IDs if you really wanted to… it’s just probably cleaner to start afresh while taking advantage of the de-duplication.

To use the same IDs, after doing the initial revision 1 backup(s) on the local server, you should manually delete the 1 snapshot files (or just the whole snapshots folder on the storage), so all you have locally are the chunks. Then you can copy the entire B2 revision history to local and it’ll allow you to keep backing up to those IDs and later reverse the copy direction.

3 Likes

Perfect, thanks again!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.