One backup job to multiple targets sharing same drive letter with different sync schedules?

justinb88 · 18 March 2022 19:52

Hi, would like to get more information about this use case before I retool and re-run my backup jobs. As part of my backup plan currently I have data being backed up to two physical USB hard drives that use the same drive letter (I manually plug them in, trigger backup job and unplug them). Because the backups aren’t run at the exact same time, one copy is always going to be a slightly newer version than the other.

Previously I was using Duplicati and this caused all kinds of issues. If I backed up to USB #1 and then tried to run the same job on USB #2, its internal database would be out of sync with what’s actually on USB #2. So I ended up making two backup jobs for the same data, one for each USB drive.

With Duplicacy I’m wondering if this will cause any problems. If I create a single job (put the location at let’s just say H:) back it up to USB #1 connected at H:, then a week later I back it up to USB #2 also connected as H:. Will Duplicacy be confused by the fact that USB #2 doesn’t have the changes that were pushed to USB #1 a week prior, or will it do a full scan of what’s on USB #2 every time and push the delta to USB #2. And likewise, when I go back and run USB #1 again, will it push the appropriate changes without regard to the fact that I did a job on USB #2.

Thank you

saspus · 18 March 2022 20:48

Yes, it will work. Duplicacy does not have a database and does not make assumptions about the state of the target because stuff can be pruned from other sources. The local cache will be a mess, Depending on how you create second datastore, but it does not matter because duplicacy checks for the chunk presence on the remote first. I would create it as bit-identical (or simply copy a config file), just to be able to heal the data if needed easily.

However I strongly discourage from doing it, for many reasons, unrelated to duplicacy.

Instead, backup over the internet to some cloud destination. Since all your data fits on one physical disk — you don’t have much data in the first place so cost will be very low.

Single disks don’t provide any protection against data rot and bad sectors. You can mitigate that somewhat with erasure coding but there are no guarantees. External disks can decide to not spin up one day randomly. External disks are least realible ones and are stressed a lot, especially if you plan to haul them between locations. Basically, it’s a lot of errors for a very slim chance to be able to restore data in the end.

Use cloud services. (And under that umbrella I include a remote NAS that you control as well)

justinb88 · 18 March 2022 21:43

Thanks for the answer. That’s good to know about the lack of database.

Actually the external disks are only part of my backup plan. I also have cloud storage and data is replicated from my laptop to my NAS.

Regarding “I would create it as bit-identical (or simply copy a config file), just to be able to heal the data if needed easily.” - can you elaborate on what you mean here?

saspus · 18 March 2022 22:02

This is one of the selling features, if not the selling feature of duplicacy that enables lockless multithreaded multi-machine backup to the same datastore. More here: duplicacy/duplicacy_paper.pdf at master · gilbertchen/duplicacy · GitHub

The data is encrypted with the keys stored in config file; that file is encrypted with your password. When you create bit-identical storage, the keys will be the same and hence produce the same chunks. This will allow you to retrieve corrupted (due to disk failure) chunk from another disk.

Otherwise the chunks will be different and you won’'t be able to heal that backup; the only (reasonable) recourse would be to prune all revisions that reference the bad chunk.

When you add another target storage to the source you can specify it to be bit-identical:

Add command details

The -bit-identical option is used along with the -copy option and will copy the IDKey, ChunkKey and FileKey to the new storage from the old one. In this case the names of the chunks generated by Duplicacy during backup will be identical in the source and new storage.

This has the effect that you can rclone the chunks folder for example from local (source) to Google Drive (new storage), and then only do backups on Google Drive, and the existing chunks will be identical (same name, same size) as if the backup was run locally.

The -bit-identical option does not copy the encryption option. It is possible to have an encrypted source and an unencrypted new storage, or vice versa. The -e option determines whether or not the new storage will be encrypted.

This means of course, that the added storage can have a different password from the source.

BTW – why don’t you create two storage locations each corresponding to a new drive to be able to manage them separately? I don’t see the benefit of conflating them into the same destination when in reality they are different.

But if you still want to go with original plan – backup to first drive, then copy everything from it to the second drive, and that’s pretty much it.

Then why waste time with the significantly more labor intensive, and much less reliable solution?

justinb88 · 19 March 2022 02:16

If that’s possible it sounds like the best solution. I’ve been using Duplicacy from the web GUI which I understand is much more limited than the CLI, so maybe I just don’t know how to do it. The main thing I’m trying to avoid is creating two or three separate backup jobs for the same data because my include/exclude rules and various options may change, so I’d have to maintain multiple backup jobs in sync with one another. It would be ideal if I could define a single backup routine (i.e. backup these files from this folder, etc.) and define/schedule multiple locations to send that backup job to. If that’s possible then I have no issue with having multiple locations.

I sleep better at night with a copy of my data in cold storage. Cloud + NAS is a great starting point but one ransomware attack could easily wipe out both of those (I do have sufficient planning against ransomware but it’s still possible.) Having cold copies of my data in an offsite location provides an additional layer of protection if the worst happens.

gchen · 19 March 2022 04:01

Actually this usage isn’t supported out of the box, mostly because of the cache. Suppose that the latest revision on USB #1 is 1. When Duplicacy tries to start a new backup, it will find revision 1 on USB #1, but if the cache contains a copy of revision 1 that was created from the backup performed against USB #2, it will still use the local revision 1 which is different from the remote revision 1 on USB #1.

The workaround is to create a pre-backup script (see Pre Command and Post Command Scripts). That is, create a script file named pre-backup under ~/.duplicacy-web/repositories/localhost/n/.duplicacy/scripts (where n is the index of the backup) that deletes the cache /.duplicacy-web/repositories/localhost/n/.duplicacy/cache.

towerbr · 19 March 2022 12:46

In addition to not using databases, as @saspus said, there is another characteristic of backups generated by Duplicacy that you may have missed and that is essential: the snapshots (revisions) are immutable. That is, once the backup has been made and the chunks have been uploaded, they will never change, until the day they are intentionally deleted (prune command).

In my case, I backup to a S3 storage with a write-only key. So even if a ransomware attack occurs, it cannot affect the backup. It’s not cold or offline, but in practice it has the same effect.

I run prune manually, and at the time of running it I manually decrypt a key that has permission to delete on S3.

In fact I think even this is overzealous, I could leave the key unencrypted. If an attack occurs on my computers, I don’t think the ransomware will have the “intelligence” to find out that that key is from that storage and access it directly.

saspus · 19 March 2022 19:07

Instead of specifying the long list of exclusions for each backup task of the same source — create one exclusion list file, and include it into both. I.e. your exclusions will look like so:

@/Users/james/exclusions.txt

Alternatively, you can keep exclusion metadata along with the data with .nobackup files. Then you don’t need to worry about maintaining exclusion file separately.

It would be ideal if I could define a single backup routine (i.e. backup these files from this folder, etc.) and define/schedule multiple locations to send that backup job to. If that’s possible then I have no issue with having multiple locations.

No, that is not possible. But the supported way is not much different.

Define two backup targets, and define two backup schedules to those targets. Both schedules can be different or same — it’s literally just a time, source, and destination.

It’s not worth trying to go unsupported way for the sake of saving those few mouse clicks.

I sleep better at night with a copy of my data in cold storage.

Essentially you trust your solitary hard drive with no redundancy more than commercial datacenter. This may feel better, but does not reflect reality.

Cloud + NAS is a great starting point

It’s a destination, actually.

but one ransomware attack could easily wipe out both of those

It’s impossible with right setup. Server side snapshots, immutable keys, and other common techniques eliminate that risk completely.

With your hard drives though — who’s to tell your directly connected disk will survive? It will be encrypted first. And you don’t have any recourse like snapshots. All ransomware needs to do is sneakily encrypt a single config file on the destination and your backup is suddenly a pumpkin. If anything — connecting a data drive to an infected computer is not a good plan right there. (And you can’t really know when it’s infected)

(I do have sufficient planning against ransomware but it’s still possible.) Having cold copies of my data in an offsite location provides an additional layer of protection if the worst happens.

It gives an illusion of safety without improving anything, which is arguably worse.

Think about it this way: will a say a Credit Union or a bank mess with connecting hard drives to the server and moving them around or will they backup to AWS with carefully configured credentials, snapshotting, and bucket immutability that will entirely eliminate the risk of losing data to the attacks? And they are arguably higher value target. And then commercial datacenter will keep your data safer, cheaper, and for longer than a bunch of lose disks with a chore.

In other words, in the absense of better reasoning — do what corporations do. And they definitely don’t move a bunch of drives around.

But few people provided you quite a bit of reasoning here, so there is even fewer reasons to stick to “feel good” setup. Unless that’s the goal, simply as a means that accomplishes better sleep, with the understanding that it actually does not improve data safety.

Search this forum, there is a detailed explanation on how to configure immutable backup to B2. Until duplicacy supports Glacier storage B2 is the most cost effective online target.

Droolio · 20 March 2022 16:21

You’re doing it right. The 3-2-1 strategy means not putting your eggs in one basket. i.e. not putting ALL your backups in the cloud, or not putting ALL your backups in local storage. Cloud is not invulnerable, and choosing 2 separate providers doesn’t necessarily guarantee they’re not stored in different datacentres, or that a disaster in one country doesn’t knock them both out. That’s the point of cloud - you don’t know where (apart from the region maybe).

So a single (or two) external HDDs is perfectly fine as a backup, so long as you assume it can fail at any moment, and you have other copies elsewhere. Which sounds like what you’re doing.

One thing I’d recommend, however, is to test those backups more regularly, unless you’re using them as a source for a copy job off-site. A copy in effect verifies the integrity of the data as it’s copied off the drive, and will produce errors if anything is amiss. If you’re not doing that, you need to test the data by restores etc… (And do that anyway; but the more regular you do it, the quicker you find out.)

Now, consider the scenario that your NAS or local external drive gets hit by randomware. Most likely, the config file gets encrypted and, at this point, you’ll find out immediately. If it doesn’t, and encrypts random chunks instead, you might not find out til it’s too late. Hence the need to test.