Newbie asking if they're doing this right?

gecko · 3 August 2020 13:42

So having been disappointed with the support and lack of documentation from Arq 6, I have been on the lookout for a new backup strategy, and have come across Duplicacy. This seems to be highly recommended by most people who have left Arq and on backup sub-reddits.

As a complete noob to Duplicacy, I wanted to know if I am doing this correctly, as I think it is a little daunting and complicated for the uninitiated, especially with the use of its own nomenclature. I am using the Web-based GUI.

I want to backup my MacOS Home Folder to both OneDrive and Backblaze B2. I want B2 to include iTunes backups of my iOS devices (Library/Application Support/MobileSync/*), but I do not want OneDrive to include them. Apart from this, I want them to be the same.

I have therefore done the following in order:

Created OneDrive Storage and ticked the ‘Copy-compatible’ box, even though there was nothing to make it copy compatible to.
Created B2_Backup storage, making it Copy-compatible to OneDrive.
Created B2_OneDriveMirror storage in the same B2 bucket as above.
Created Backup with ID of ‘GeckosMBP_HomeFolder’ backing up the /Users/gecko directory to OneDrive storage, with an exclusion rule e:Library/Application Support/MobileSync/Backup/.*$
Created Backup with the same ID of ‘GeckosMBP_HomeFolder’ backing up the same /Users/gecko directory but to B2_Backup storage, without the above exclusion rule.
Created a schedule that first performs a Backup GeckosMBP_HomeFolder to OneDrive, then of GeckosMBP_HomeFolder to B2_Backup, then a Copy of OneDrive to B2_OneDriveMirror.
Executed schedule.

How does this look? Is this the correct way of doing this? As far as I can tell, I will then have the same backup on both OneDrive and B2_OneDriveMirror, with the added option of restoring iTunes backups from B2_Backup if ever needed.

What sort of Pruning schedule should I then setup for this?

Look forward to hearing some advice, as it will be much appreciated by this newbie.

tangofan · 3 August 2020 17:34

Yes, it is indeed a bit daunting and it took me a bit as well to get the hang of it.

Are you sure that you are creating two different storages in the same B2 bucket? As far as I know that isn’t possible yet, though perhaps that has changed, since the time I setup Duplicacy-Web a few months ago.

I wouldn’t do it this way:

If I understand your process correctly, then you’d have some data in B2_backup and then the same data in B2_OneDriveMirror without any deduplication, since there are separate storages.
I wouldn’t use different backups with the same backup ID and different exclusion rules. It may work, but it seems pretty messy (to me at least)
Since the copies of your HomeFolder data to B2_backup and B2_OneDriveMirror are independent, you don’t even have the exact same versions there, if files changed between the execution of the two jobs.

What I would do (names are just for clarity):

Create one backup ID “HomeFolder” and make sure to exclude the iTunes stuff here. Back this data to OneDrive. Create a copy job to duplicate to B2 to a copy-compatible storage.
Create a 2nd backup ID “iTunes” that only includes the iTunes stuff, nothing else. Backup to B2 (either into the same bucket/storage) or into a different one. If you back up into a different storage, you could still make it copy-compatible (no harm there AFAIK), but don’t need to.

As I said above, if I understand this correctly, you’d also have an extra duplicate of the data in B2_Backup, which is probably what you don’t want.

Really up to you and your data. The beauty of having two separate backup IDs for the user data (ex iTunes) and the iTunes stuff is that you can set separate prune rules. E.g. for your user data you might want extra long retention (say versions past 5 or 10 years), but for the iTunes stuff it would make sense to have much shorter retention (say 6 months), since the data shouldn’t ever change and perhaps all you want is some protection from ransomware attackes.

FWIW, have two separate backup schedules, one with a basic backup+copy, multiple times a day and then one at night that does backup+copy+prune+check. However depending on how large your backup set is, it could make sense to run prune (and/or check) only on a weekly basis. My backup set is only about 50GB, so running it daily is no problem.

FWIW these are my prune parameters in the Web-UI:

-keep 180:1831 -keep 90:1101 -keep 30:733 -keep 14:367 -keep 7:184 -keep 3:93 -keep 1:32 -threads 4 -a

This uses the same prune rules for all backup-IDs in my set. By using the -id parameter, you could limit the prune to a specific backup ID and then use another prune job for the 2nd backup-id.

As you can see, this prune schedule is for long-term retention of all data, with frequency of versions going down over time. For your iTunes backup-ID, something like this might be more appropriate.

-keep -keep 0:186 -keep 7:31 -keep 1:7 -threads 4 -id iTunesStuffID

This would only keep daily backups versions older than seven days, weekly versions old than 31 days and no versions older than 186 days.

Hope this helps a bit.

gecko · 3 August 2020 22:09

Firstly, thanks for your extensive reply, I really appreciate it

Glad I’m not the only one

I believe so? I created the B2_Backup storage in the B2 bucket first, and checked the copy-compatible checkbox with OneDrive. I then created the B2_OneDriveMirror storage, and used the exact same App Key for accessing the bucket as before (which only has access to that particular bucket), and it created just fine. Indeed, in my list of storages both have the same b2://xxx url location.

I don’t think that is the case. I believe there is deduplication when they are in the same bucket, as they use the same config. I followed the instructions from Back up to multiple storages

According to Back up to multiple storages doing this would cause you to download all the data from the first storage (in my example OneDrive) onto your computer, before then re-uploading to the second. So what you do instead is to create a dummy snapshot (B2_Backup in my example) that you use to run a backup command to get all the same data into the cloud, that then the real mirror backup (by using the copy command) uses.

So basically, if I have followed that How To correctly, my B2_Backup storage is my dummy snapshot, and my B2_OneDriveMirror storage is my real snapshot, in that it is a mirror of my OneDrive storage. The difference being that I can also use my dummy snapshot to restore iTunes backups if and when needed.

The problem I have is that the nomenclature is pretty confusing, and I don’t know if I am using the correct terms, and it seems like a lot of the terms in the How To’s are more geared towards using the CLI. For example, the How To I’ve referenced uses the terms Snapshot ID’s, but I haven’t come across that term anywhere in the Web GUI, so I’m trying to follow it, and hoping that the setup I’ve done is at least somewhat a correct version of what that How To is advising.

This actually might be a better way of doing it though, and like you say, just create a new bucket to contain those backups. I was trying to adapt the Back up to multiple storages but this method would certainly be cleaner.

tangofan · 3 August 2020 22:48

I am actually wondering, if these effectively are just the same storage, like a directory that has two mount points and thus can be accessed by two different file paths. If you already have some data in there, there’s an easy way to find out: Run a check command against both storages.

If they have the same content, e.g. the same backup-IDs and versions, then it’s only one storage. If the content is different, then this scenario is likely way above my pay-grade.

Again, it may (or may not) be a case of it being effectively the same storage.

[quote]
According to Back up to multiple storages doing this would cause you to download all the data from the first storage (in my example OneDrive) onto your computer, before then re-uploading to the second. So what you do instead is to create a dummy snapshot (B2_Backup in my example) that you use to run a backup command to get all the same data into the cloud, that then the real mirror backup (by using the copy command) uses.

So basically, if I have followed that How To correctly, my B2_Backup storage is my dummy snapshot, and my B2_OneDriveMirror storage is my real snapshot, in that it is a mirror of my OneDrive storage. The difference being that I can also use my dummy snapshot to restore iTunes backups if and when needed.[/quote]

Oh, okay, I missed how that’s supposed to work. Yes, you are effectively trading data transfer amount for storage space here. One thing I would suggest (once again assuming it’s effectively the same storage), to use a very short prune schedule for the “dummy” backup-ID, because once you’ve copied the “real” data, you don’t need those other blocks anymore. So assuming that this all happens in the same schedule in sequence, you could expire those “dummy” versions after just a few days. (Of course, if your iTunes data happens to be part of the “dummy” backup-ID, then you may want to be less aggressive with pruning. Another reason for keeping that in a separate backup-ID.)

It is very inconsistent and that makes it confusing. E.g. the CLI versions calls them snapshot IDs, the Web GUI uses the term backup-IDs. That’s, if I remember this correctly.

You could still use that “Back up to multiple storages” method for the stuff going into B2 for your user data, if you wanted to. That’s kind of independent from separating the iTunes data out. Whether that’s worth it in the long run, I’m not sure. That would depend on, how big your initial backup is, how big your delta backups will be and what - if any - data cap your internet provider imposes.