Help understanding Prune

evilgenius · 12 November 2024 17:24

New Duplicacy user here, running the Web-UI version in a Docker environment from my Synology NAS.

I know there have been other threads about understanding the Prune command, but after searching through them they’re still not resolving my confusion or, rather, not in a way I understand so I would hugely appreciate some direct help!

This is the setup:

I have Duplicacy Web-UI running on my Synology NAS.
Duplicacy connects to a Backblaze B2 bucket I own. That bucket has a master retention setting of “Keep prior versions of a file for 30 days” active. I use one bucket for all cloud backups.
The NAS contains folders of data that very rarely changes. These are things like audio and video files that I need cloud copies of, but only really one or two revisions of. Let’s call these Type 1 files.
The NAS also contains folders that are the backup targets of system backups from machines on my network. These are changing daily, and I need to retain extended cloud revisions of these. These are Type 2 files.

Now I currently, and I suspect foolishly, have a Prune command set up after each one of my backup jobs, which are either set to:
[-keep 0:14 -keep 7:14 -keep 1:7 -a] (for Type 1; the data that rarely changes)
[-keep 0:7 -keep 7:30 -keep 1:7 -a] (for Type 2; the data that changes often)

Reading these forums, I think this is incorrect and I should instead be having just one, master, Prune command that handles everything. But I’m not sure exactly how and what to set it to to find a happy medium between my file types.

I therefore have three questions:

Am I correct in thinking I should remove all the per-job Prune commands and instead create a single Prune with setting with the [-keep 0:7 -keep 7:30 -keep 1:7 -a] arrangement? That way my Type 2 files - the daily backups from various systems - will keep a rolling copy of the last 30 days, and the Type 1 files - that almost never change - will do the same but will not overfill the bucket since they have no new revisions.
Looking at the logs, I can see some of the Prune events on some jobs have affected chunks on other jobs. Have I unwittingly screwed my backups, some of which took me almost a week to initially seed?
Regarding the Backblaze 30 version retention, will this interfere with Duplicacy? If it does, am I best - as I think I am - simply telling Backblaze to never prune or remove and have Duplicacy handle all that?

My apologies if these are simple questions, but I’m very new to Duplicacy and more than a little lost!

Droolio · 12 November 2024 19:22

Agree with this assessment.

While you can set up different retention policies, it’s a bit tedious if you have many source repositories, as you’ll need to remove the -a (all) flag and specify -id <repo> instead. Because Duplicacy doesn’t let you specify multiple -ids in the same command, you’d have to do a prune for every repo you have.

v3.2.4 CLI fixed a bug that let’s you do -collect-only and then a final -delete-only at the end, reducing the overhead a bit, but it’s still no less tedious. (It’d be really nice if you could embed the retention period directly into the config.)

BTW, both your prune configs are a bit wrong I feel… “Keep prior versions of a file for 30 days” - Duplicacy doesn’t work like that, it’s snapshot-based not file-based.

The closest to that is -keep 0:30, which means “Keep no revisions older than 30 days”. Yet you’ve used 0:7 and 0:14. Let’s say you stick with one of them; 0:14 - that still means your oldest revision will be deleted after 14 days. Your second option -keep 7:14 won’t even get a look-in, coz there’ll be no 2 revisions older than 14 days to keep them 7 days apart.

I encourage you to set a much longer timespan of 0:365 or more (or even remove the 0:m part altogether since de-duplication is very effective). Look at the example in the wiki and plug your own numbers into the wording, so you have a good understanding.

You’ve probably pruned more than you intended to - everything older than 7 days, but it wouldn’t have affected backups going forward. You’ll just not be able to restore from any of the removed snaphots. You’ll still have your most recent backups, from which future incremental backups will be based on.

No need to re-seed more.

I’m not a BB user but I suspect you’re best off disabling this feature anyway.

The reason I say this is because I’ve had to deal with recovering raw chunk and snapshot files from a Google Drive, using their limited 30 day history recovery tool.

I encountered issues where ‘restored’ snapshot revisions were pointing to chunks that no longer existed. So if you ever keep the feature enabled, only restore from the chunks directory. Those can always be cleaned up with a prune -exhaustive later.

evilgenius · 12 November 2024 21:18

Thank you, Droolio, for taking the time to give me patient and much needed advice on the way Duplicacy handles its backups and retention.

Having now read both your responses and the information you linked to, I’ve dropped the Prune command for all my jobs and introduced a single master Prune that uses the -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 configuration.

Whilst it’s obviously the case I want to find a happy medium between retaining as many cloud backups as I might need and keeping the total size of those backups as small as possible since I have to pay for every GB, I think this setup will give me what I need. Besides, if it does start becoming too expensive, I can tighten the Pruning to keep the number of copies, and by extension the cost, down.

system · 22 November 2024 21:19

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.