Help with Prune/Retention Settings

Environment:

I have several computers in my network that all, with various programs, backup to my Synology NAS drive. The NAS runs Duplicacy via Docker and uplifts everything to a Backblaze B2 bucket.

All backups work absolutely fine, backup as needed, and I have both a Check schedule running and semi-regularly check my backups are valid by restoring randomly selected files.

Everything is running smoothly.

The Question:

I have a scheduled task inside Duplicacy that runs housekeeping. Its main job is to Prune the backups in storage at Backblaze. The setting I have for this is currently:

-keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7

Now I’ve noticed I’m starting to pay Backblaze more each month that I’d like, or I think is necessary, so I’d like to scale this down so Backblaze is only ever storing the last month worth of backups and nothing beyond that.

The reason for this thread is I’m a little confused as to how the Prune schedule works and exactly what command I’d need to achieve this.
I know my -keep 0:360 means no revisions older than 360 days (basically, a year), so am I correct in thinking my new revised command should be:

-keep 0:360 -keep 0:180 -keep 1:30 -keep 1:7

That would give me only 1 copy at a month old, I think? But what does the 1:7 mean? The help documents say it’s “1 revision for revisions older than 7 days”. Why older than 7 days? I want 1 revision each day, which is when the backup happens, and then nothing over a month; i.e. I only ever want 30 rolling days backups. Does the above command satisfy that?

Apologies if this is a simple question, but I’m having a hard time following how these retention commands work.

You want this:
-keep 0:180 -keep 1:30

That will not keep any snapshots older than 6 months; it will delete them all. It will then keep 1 snapshot every day for any snapshots older than 1 month. For any snapshots made within a month, it will just keep every single snapshot you create. (Technically, you don’t even need that second -keep because you are only backing up once a day, so in practice it’s not really going to delete anything.

The help documents say it’s “1 revision for revisions older than 7 days”. Why older than 7 days?

Because the command you are running is prune which is a command to DELETE snapshots. -keep is just a flag. It’s saying, “while you are in the processing deleting a bunch of stuff, don’t delete this stuff.”

If you want 1 snapshot for each day up to 30, then just make sure you run a backup at least every day. On the other hand, if you were backup up every hour, then you might want to add more flags to decide how many of those to keep.

2 Likes

That makes, sense; thank you.

One more thing, most of my backups are daily, but I also have two jobs scheduled on the NAS that backup weekly on a specific day. These are backups that aren’t subject to much change, so don’t need as frequent snapshotting.

In the example you gave - -keep 0:180 -keep 1:30 - can I assume, then, that the daily backups will be one as day and not go over 30 days (i.e there will be 30 copies of them) and the weekly ones will have four historical examples?

Here’s what we get if we run duplicacy prune -h

-keep <n:m> [+] keep 1 snapshot every n days for snapshots older than m days

So just plug in our -keep 1:30 and it will tell us what happens:
keep 1 snapshot every 1 day for snapshots older than 30 days.

This means:

  • There is no “30 copies” anywhere here
  • For any snapshot that is less than 30 days old, duplicacy won’t delete it. How many of those snapshots there are just depends on how many you made.
  • For snapshots older than 30 days, duplicacy will keep 1 per day and delete the rest.
  • If the number of snapshots Duplicacy encounters is smaller than the prune command is instructed to keep, then it will just keep them all.
  • For example, if you run prune with the keep flags described above, -keep 0:18 -keep 1:30 , duplicacy will DELETE all snapshots older than 180 days and NOT TOUCH any snapshots less than 30 days old. It will look at snapshots between 30 and 180 days old. If finds only one per week or two per week, then it won’t delete anything. If it finds 10 or 20 per week, then it will delete 3 or 13 per week (that is, enough to get to 1 per day).

Please also keep in mind that you can can run prune against all snapshot ids -a or against a specific snapshot id -id <snapshot id>, so when it comes to the different jobs you have scheduled it will depend how you run the backups (same snapshot id, or different ids) and how you run the prunes.

Hokay, I think I get how Duplicacy is handling these backups now, thank you.

Honestly, I still don’t get why it doesn’t just do the same as most backup software packages and clearly state “keep X copies” and then rollover at that value. That would be so much easier to understand and account for!

You want to keep cadence of backups in time, regardless of number of backups taken at any specific point in time, so “keep X copies” is meaningless, because you can’t know will these X copies span 1 day or one year.

For example if you take hourly backups, but one day your machine slept for 8 hours and there other -16 hours, you will end up with different length backup history. This is bad and inconsistent. Pruning by time backup was taken will do the right thing there.

Pretty much all backup software operates that way — by specifying time intervals in the past the snapshots must be kept at.

Duplicacy’s way to specify that cadence is different, but it’s no better and no worse than any other approach. It provides flexibility at the expense of extra complexity. Web UI however has a preset for most common configuration (GFS) to hide the complexity from those that don’t need it.