How to setup Duplicacy storage

ToddK · 10 January 2023 23:56

I have three clients I want to protect: a personal PC [Windows], a server [unRAID], and a router [Ubiquiti EdgeRouter]. I’m less interested in protecting against an “oopsie” day-to-day. For anything critical, I have a local git repository to retain versions, etc… Instead, I want to protect against the catastrophic failure.

My thought process is to have two lines of defense:

Have each client run it’s own backup targeting a staging area on the unRAID server.
Use Duplicacy to store the staged backups at a remote storage provider [B2].

However, I’m struggling a bit on the proper way to configure Duplicacy. I like the idea of having three separate storage locations; one for each client. The thought here is I can quickly see how much space is taken by each client. But, that 1) feels overly complex and 2) appears to fly in the face of Duplicacy’s de-duplication.

Should I backup the entire staging area to a single Duplicacy storage to a single B2 bucket? Should I backup each client to individual Duplicacy storages to a single B2 bucket? Use individual B2 buckets for each client?

What would be the recommendation for how to set this up? What sorts of things should I be keeping in mind as I create a recovery plan?

saspus · 11 January 2023 00:19

Save configuration from the router every time you make changes. There is no need for continuous backup of anything else.

Are there a lot of overlapping data?

Why would this matter or be of interest?

Backup of copy?

Don’t backup backup. Backup source data.

What are you trying to accomplish by this?

One duplicacy storaage location at your destination, eg. a bucket on B2 or Storj
One duplicacy instance on your PC backing up to the storage above, with snapshotID1
Another duplicacy instance on your unraid contraption, backing up to the same storage above, with SnapshotID2.

Keep credentials to the storage and duplicacy encryption password somewhere other than your PC and unraid server, in case both succumb to fire or what not.

You can also make it ransomware-proof by carefully crafting the B2 credentials to only allow upload and disallow change or delete, effectively making your backup immutable.

ToddK · 11 January 2023 03:22

I’ve added commit hooks before. Frankly, it takes too much time and having instant backups isn’t a priority. Again, I’m not worried about making and “oopsie” and having to quickly revert. I’m far more worried about a catastrophic failure. As such, not losing a day’s worth of work isn’t an issue.

I doubt it. The clients are sufficiently unique. I may wind up adding additional clients, though I wouldn’t necessarily expect that. That’s why I’m asking. I really don’t know if this is something I should be worried about.

Frankly, idle curiousity. If I ever feel my backups are getting “too large”, it would be easy to target a specific client. This is a stretch, to be sure.

I guess I was also thinking it might be easier for restores, but now I don’t think that’s true at all. It’s going to be “equivalent” either way.

Oh. Good question! Both. For the unRAID and edge router clients, it would be a copy. For the Windows client, it would be a backup. At least that’s what I’m thinking. Part of what I’m unsure about. Would it make more sense to have everything be a copy?

I think that implies everything in the staging area should actually be a copy. Is that a valid assumption here?

I’m guessing an equivalent approach would be to copy from the PC to the staging area and then a single Duplicacity instance on unRAID; using a single Duplicacity storage and single B2 bucket. I guess I do like a single point at which data is uploaded to B2; for no other reason that it’s consistent and I know exactly where the B2 data comes from.

Yessir! That’s one thing I’m trying to be very careful about; the ability to recover from a complete disaster.

I didn’t mention it previously, but I’m actually moving to Duplicacy from Duplicati. One of the primary motivations is the catastrophic scenario [fire, earthquake, flood, … ]. While I’ve saved my Duplicati configurations to B2, I honesly would much rather have the software take care of that.

Can you provide information on how this can be accomplished? While I’m typically very careful on what I click / run, you can never be 100% certain something doesn’t make it through. I would much rather be prepared instead of having to deal with the aftermath.

Thank for the thorough response @saspus . Lots of good information here. I’m feeling I’m about 80% the way there to a solid solution.

saspus · 11 January 2023 19:29

How often are you changing your router settings? I’ve setup mine years ago and did not need to touch it ever since

I think duplicacy reports size of each backup “snapshot id”, so you have some volubility, albeit mudded with overlapping data.

Right. But why have this extra copy in the first place? On each client you would need to ensure atomicity (e.g. take filesystem snapshot) then copy, then wait for copy to be done, then backup, and what if backup takes longer than time between copies? Then you need to create another local snapshot…. Too complex and fragile.

Running duplicacy on each client is straightforward, eliminates extra copy, and all that synchronization work. Duplicacy manages filesystem snapshots on its own (on Mac and windows; on Linux you have to write pre- and post- backup script yourself)

While I’ve saved my Duplicati configurations to B2,

kudos on this. I have no idea why people still use Duplicati, or why is it so high in google search results. It’s a flaky crap, with no official stabile version, mind you. I’ve tested it before, and it failed miserably. I don’t shit on it because we are on different backup program forum, I genuinely hate it with passion. There are many other good backup programs (Arq, restic, borg) so I’m not being duplicacy fanboy here.

I honesly would much rather have the software take care of that.

Elaborate here. What do you mean “take care of that”? Duplicacy’s encryption key is by design secret. Storing it at the storage defeats the purpose. If you don’t need that — don’t encrypt the duplicacy storage, then you only need to maintain access to your b2 account.

It does not matter how careful you are. There is no protection against zero day exploits. While I too agree it’s a bit paranoid to worry about such things — if may provide some peace of mind.

Have a look at this thread: How secure is duplicacy? - #30 by tallgrass

ToddK · 11 January 2023 21:07

Heh. You’re making my point for me. If not modifying very frequently, why is it necessary to backup after every commit?

I think perhaps my reason for backups is slightly different than yours. I want backups as a safety net and not some form of version control.

Restoring from B2 costs time and money. If I have a local copy, it’s trivial to quickly restore a file / configuration. At least that’s my rationalization. Fortunately, in all the time I’ve been making backups, I’ve not needed them. The best insurance policy is the one you don’t have to use.

My setup is non-standard. Out of the 3 clients I back up, only the Windows client is normal. The unRAID server, while based on slackware, is not a typical installation; only a portion of it needs backing up, interactions with docker require special handling, etc… Lastly, there’s no way to run Duplicacy on a router.

Since I must have some client copies, it seems better to be consistent. I’m not concerned about the extra copy [and the space it consumes]. I also don’t see any issues with backup time. I’m not copying that much data; on the order of 10’s of gigs. And that’s a full copy. Performing a nightly data sync will be significantly smaller.

For myself, I blame ignorance. Prior to Duplicati, I had no emergency plan and it was a very low barrier to entry. Now that I’ve dug deeper into this, I definitely want to steer clear of Duplicati.

Sorry for any confusion. I was referring to the need to explicitly save Duplicati’s configurations. That is not something the end user should need to do. The backup software itself should be able to restore the backup it wrote.

Couldn’t agree more. Thanks for the link. Will check out the thread.

I appreciate the guidance, @saspus .

At this point, the only thing I’m still not certain of is my schedule. Nightly, I run a backup to B2 and immediately afterward run a check. Weekly, I run a check with -chunks -fossils immediately followed by a prune.

saspus · 11 January 2023 21:29

That is precisely the reason I download configuration (to my Documents folder, that gets later backed up) from the router after every change! I don’t care about the diffs, but if my router burns in flames, I want to restore the latest configuration to the replacement, and not have to remember what config change could I have made after last time the config was exported. Less thinking → more reliability. By incorporating mindless “export config” step after every config change I guarantee that my router config in the backup is always up-to-date. Since I rarely if ever do that—the net benefit is positive. I take the same approach with all servers I manage, not just router. TrueNAS, cloud instances, etc. Change something → save the result.

Great. Let duplicacy on your PC to also back up to your NAS, in addition to the B2. Duplicacy supports multiple destinations. This will make you back up flows consistent and use the same tools.

That’s where duplicacy’s rather efficient filtering and pre-backup scripts come handy.

What interaction? You should not need to interact with it at all. All data that containers operate on is on the host. Containers themselves are disposable by design, no need to back up them.

Why not? But I agree, there is no need to involve router here at all, if you adopt save-config-after-every-change thingy. Or you can ssh to it in pre-backup script and fetch the config, but why bother?

In the name of consistency, use duplicacy also to back up your hosts to unraid. Or even better—only backup your hosts with duplicacy to unraid, and then use duplicacy copy to upload that backup to B2, so that your clients complete backup in the LAN, and unraid takes its time uploading it to B2. Either way—plain copy/sync in the middle of it feels jarring, and you still need to mess with snapshotting, otherwise your copy won’t be consistent.

Got it. Yes, Duplicacy does not go as far as storing all configuration as Kopia does, but what it stores is sufficient, in the sense that on the new machine you need only storage credentials and encryption passcode to restore and/or continue backup. It does not save filters file on the target, so if you have elaborate filters written—you’ll have to store them yourself. I solved it by keeping filter file somewhere that gets backed up, and in the duplicacy configuration I simply include that filter file.

Glad to help.

-chunks will force egress. Don’t do that. The only reason to do that is if you don’t trust B2 API to report failure when it failed to save the uploaded chunk. If you don’t trust B2 API—don’t use B2.

Start with not pruning, see how it goes. If you don’t prune—you can put all those safeguards as described in the other thread. Then if you see that you have tons of transient data and your backup grows too much—you can prune from some other trusted machine. But if you back up once a day—I don’t see how this can be the case. I personally also backup once a day and never prune. Storage is cheap—why should I be compelled to delete anything at all?

ToddK · 11 January 2023 21:58

Some great points. Thanks.

Oh. I was absolutely not planing on doing a “plain copy”; I was planning on something like duplicacy copy from Windows → unRAID. So, it seems we’re saying similar things? It’s just a matter of where the duplicacy copy happens; either locally or remotely.

I’ve been reading some other posts here and am now a little scared I haven’t saved enough to B2 / alternate storage. I’m running your duplicacy-web docker and there are a handful of .json files in the config directory. I’ll likely store the entire directory, but what would be the sheer minimum needed to restore from nothing? Would duplicacy.json + the master password be sufficient?

There you go. Already doing something I shouldn’t be. I do trust B2. And, as you say, if I didn’t, I wouldn’t be using them.

Does it still make sense to run a weekly check with -fossils? Or, should I add that to the nightly check run? Or skip it altogether?

I’ve just started my Duplicacy journey, so pruning at this point wouldn’t do anything anyways. I’m not making that many changes, so it’s reasonable to leave it off and see how the storage grows over time. Thanks for the suggestion.

Conceptually, there’s no point to delete. Practically, stuff becomes woefully obsolete. Fast. Kinda like when I cleaned up my bin of old computer parts last year and finally threw away the FDD and PATA cables I’d been holding onto.

saspus · 11 January 2023 23:07

Ah, then yes. Not sure however how can you duplicacy copy from windows to unpaid—I thought your windows is a client machine, not a backup target?

So I guess the differece is

Scenario 1

Windows machine:
- Duplicacy backup: source=local data, target=unraid(local repo)
- Duplicacy backup: source=local data, target=B2
unraid:
- Duplicacy backup: source=local data: target=B2

Scenario 2:

Windows machine:
- Duplicacy backup: source=local data, target=unraid
unraid:
- Duplicacy copy: source=local repo: target=B2
- Duplicacy backup: source=local data, target=B2

Yes? I prefer scenario 2 because clients have to do less work in this case.

Nothing in that directory is useful. The only useful thing is probably “filters” file, if you have configured any filters. The “master password” only encrypts application configuration locally. If its data is gone—saving that password is pointless . You only really need to have access to your B2 account (don’t even bother saving API keys and secrets, you can always generate new ones) and backup encryption password if you have used any. Maybe there is a way to restore duplciacy.json, etc., but I haven’t tried. Setting up from scratch in the event of failure is cleaner, and fewer data needs to be retained. There are some things in .json that are dependent on host (hostname, or machine id?, such as license-derived keys) it’s just not worth it to bother with that in my opinion. Adding new storage and new backup schedule takes one minute. I guess if you have a complex set of schedules and destination – maybe there is some value in preserving app state… but this raises a question—why do you have such as complex configuration

BTW, duplicacy follows first level symliks, so some opt to create one folder somewhere, and symlink everything that needs to be backed up into it. This maybe be simpler than trying to create filters and/or multiple backup jobs.

Simple check (with -fossils, which should be default really) verifies that all chunks that duplicacy thinks should be present are in fact present. It’s a “cheap” check—it only downloads snapshot chunks and list of files, and validates the integrity of duplicacy’s snapshots. If it passes – that means all files that snapshot refers to are in place, and therefore the backup is restorable. I don’t run it myself. But it may make sense to run it after every backup, or weekly, etc. If you don’t prune, and configure the bucket credentials in immutable way per that forum post—nothing can delete any files, so the check is also pointless.

Very true. But the tradeoff is—store woefully obsolete stuff indefinitely vs lift the finger to delete it. The former tends to be cheaper. And besides, there can be cases when stuff you just deleted suddenly becomes super important tomorrow.

Lolz. You should have waited few more decades and donated to some technology museum :). I do keep selective artifacts from the past though—like Dell Axim X5 PDA—that’s fully working, or my AMD Athlon CPU with motherboard with ISA bus to fit my (already then-obsolete US Robotics modem with that bus). That machine still boots and works. It has sentimental value to me.

ToddK · 12 January 2023 00:33

Awesome, because that’s the one I’ve implemented.

Oh. That’s great to hear. So, the only thing I need to recover from a complete disaster are two passwords; one for B2 and one for backup encryption. Is that correct?

One of the things I’ve been woefully negligent on is running through a “house burned down” restore scenario. It’s on my list to do that. Soon.

I actually back up all of the data for my dockers, so this is already covered by my unRAID backups. I won’t have it for the initial restore after a catastrophic failure. But, at that point, none of this is really important. The only thing that matters is getting the data back.

My schedule has gotten even more simplistic! I’ll probably still run the simple check [with -fossils]. I’m okay taking a “belt and suspender” approach here.

Wow! That’s really cool. And, probably the only legitimate reason to still be running Windows XP. Please, oh please, tell me you’re not running Windows 98!

saspus · 12 January 2023 00:58

Windows 95 OSR2 . Windows 98 even then was a horrific garbage…

sevimo · 12 January 2023 17:13

Windows 98 was alright. Windows Me on the other hand…

saspus · 12 January 2023 17:32

Oh yes, not sure what was the plan there.

To be fair, all the hate and frustration with windows stability pre Vista is rooted in a horrifically designed driver model: it was impossible to write a stable driver due to sheer amount of boilerplate code required to be done just right, lacking documentation, and the way kernel worked pre NT. Large companies were invited to MS HQ and aided in writing the driver software, but smaller companies had to wing it. The result were all those blue screens any time you looked at your machine funny.

So once you found the combination of drivers and OS that sort of kind of works for your environment—you tended to stick to it. Upgrading to next major OS had a very predictable outcome: few questionable UI changes, new bucket of bugs, and maybe some crucial features added (like, I don’t know, USB support).

In Vista Microsoft threw that model away and rewrote the whole stack from scratch, especially focusing on display driver model (WDDM) as the GPUs grew in complexity and their stability was paramount. Now drivers were merely extensions and developers just needed to implement device-specific functionality. The stability instantly improved, but this rather drastic change had the other side of multiple vendors not catching up in time and OS being rather new having all other childhood illnesses. Windows 7 was a polish on top of Vista, and many consider it to be one of the “best” OSes Microsoft produced.