Help how to setup Prune over Web-Gui

bluecat · 3 December 2022 12:59

Hello, I’m a little confused on how to set up the correct PRUNE with the Webui. I do daily backups and want the following settings:
Keep daily backups for 12 months
Keep weekly backups for 24 months
Keep monthly backups for 48 months
Delete everything older than 48 months

Is this even possible? Could you tell me, what I need to input in the GUI Menu?

2nd questions:

Is there any disadvantage or loss of speed if I do not use Prune?
Would it also be possible to manually delete snapshots? Let’s say I check manually and tell ok, now delete everything which is older than 1 year.

Thanks for your help for a beginner

saspus · 3 December 2022 17:40

That gui dialog is just a helper that is enough for most people. Click Save on that dialog. This will create a prune schedule with some retention policy as specified in the dialog. Then click on the prune options there and replace the -keep arguments with what you actually wanted.
Depends on your target storage performance (one drive — slow, Amazon s3 — fast), amount of data turnover(a lot, compared to total size, of daily changes, e.g. active vm image vs a few photos added), and backup frequency (hourly? Daily?). I personally never prune. Why would I voluntarily delete data? Storage is cheap.
You can create a schedule/job that is never allowed to run. Then you can specify any arguments you want and run it explicitly/manually. But I guess this question was due to your desired schedule not fitting into the premade GUI dialog, which is solvable as described in the first answer, so you don’t need to do this.

bluecat · 12 December 2022 13:21

Thanks for the answer!
Since I am unsure and scare to break something: would it be possible for someone to give me the exact command to include as an argument?

Keep daily backups for 12 months
Keep weekly backups for 24 months
Keep monthly backups for 48 months
Delete everything older than 48 months

saspus · 12 December 2022 20:03

The logic is to from oldest to newest, as the arguments are evaluated from left to right and the first matching is used, and specify what to keep, in days.

 -keep interval: after_age

So:

-keep 0:1440 -keep 30:720 -keep 7:360 -keep 1:1

How to read this:
- -keep 0:1440: Is snapshot older than 1440 days (30*48)? Yes—Keep 0 versions. Else—continue.
- -keep 30:720: is snapshot older than 720 days (30*24)? Yes—keep one revision every 30 days. Else—continue.
- -keep 7:360: is revision older than 360 days? (30*12)? Yes—keep one revision every 7 days. Else—continue.
- -keep 1:1: is revision older than one day? Keep one revision every day. Else—continue.
- Do nothing.

bluecat · 15 December 2022 12:46

Thanks for the code and also for the explanation. Very helpful for me, and maybe others who want to set up their own schedule!

evolze · 28 January 2023 06:16

Sorry to necropost, but I have a follow-up question to this post. It’s a bit of a long one, but @saspus, thank you so much for explaining this in such detail, it’s been a great step in the right direction toward my understanding of this!

My question:
Is it possible to only have one keep option per schedule & job? For example, I want to set up two schedules with pruning. One is for local storage and the other is for cloud storage.

For the prune schedule for both, I’d ideally prefer this:

Local Schedule:

Run the backup once a day
Keep all snapshots and remove them once age hits 30 days | -keep 0:30 ?

Cloud Schedule:

Run the backup once a day
Keep all snapshots and remove them once age hits 90 days | -keep 0:90 ?

The reason I want to do it this way is I do not fully comprehend the meaning/concept of keeping one “weekly” and “monthly” revision, as described above every 7 and 30 days. Are these just specific snapshots that once they hit a specific age from the scheduled runtime date, they are kept and considered weekly/monthly revisions?

If so, with the above scenario, about how many snapshots would be kept? I think I’m confused about how it determines when to keep the revisions and how many there would be. This is how I’m thinking about it, based on the example above:

--keep 30:720 == every 30 days out of 720 days (2 years), a snapshot is kept? Does this mean that every 30 days, a snapshot is kept and removed assuming it’s more than 720 days old?

Same with:

--keep 7:360 == every 7 days out of 360 days (1 year), a snapshot is kept? Does this mean that every 7 days, a snapshot is kept and then removed once it’s 360 days old? If so, how does it have the chance to become 720 days, and then 1440 days old? I feel like that’s not possible, especially because it is going left to right.

Or does it just continue and not remove anything at all? If this is the case, why not just have the following and not have any other --keep options:?

--keep 0:1440

What benefits does adding additional keep options provide if it’s only removing them after 48 months in the end anyways?

The whole reason I’m asking is with other backup solutions (i.e., Duplicati), this is how their version retention & deletion worked. It was convenient since once a backup hit 30 or 90 days, it would remove the oldest backup version.

Sorry for the long post, having a tough time understanding this. I don’t mind up’ing the days (i.e., 30 to 180 or 90 to 365). Thanks for taking a look and any additional help or improvements are welcome! I’ve looked at the prune command help page for probably 2 hours and it doesn’t yet click, but I know it will with a little push in the right direction!

Also, thank you so much for the Docker container!

Droolio · 28 January 2023 21:57

To answer your question, it’s important to realise you can only combine these two -keep n:m retention parameters together in decreasing order. i.e. when you sort them in decreasing order of m (as you must), then n should decrease as well (with the exception of 0:m), otherwise there’s no point as you correctly surmised, you can’t do it the other way around. (Can’t remember if Duplicacy stops you from doing that, but it certainly wants m in the correct order.)

An n of 0 is a special case, in that anything and everything older than m gets removed. So you can choose to have that, or not.

In terms of backups in general… weekly, monthly, yearly is just a traditional way to rotate backups - specifically, the Grandfather-father-son system. Duplicacy doesn’t understand these periods so precisely - it’s just numbers of days as far as it sees - and are simply emulated by 7, 30 etc., but won’t quite match a reliable pattern. (Like some backups systems can retain the first backup of the month; Duplicacy isn’t that fussed, nor can it made to be in the current implementation. Besides, we normally no longer physically rotate them.)

Let’s take a simple example of -keep 0:2555 -keep 365:30 -keep 1:14. We backup every hour (so 24 snapshots in a day).

Notice the lack of a ‘weekly’ option (just to get you to think why you might want multiple options regardless of the difference between the numbers 7 and 30).

In a sense, the first (n=0) and last (n=1) are kinda special cases. The first declares an ultimate expiry. The last - because we can only specify days and not fractions of a day - reduces the number of snapshot we keep each day, after a couple of weeks go by.

Of course you don’t have to do it that way, but after a couple weeks (say) you very likely won’t need to recover a specific hour (say). If anything went wrong in that day or perhaps you notice the day after, you have a fresh memory and have hourly snapshots to recover from. But as time goes on, generally we’re less interested in recovering from a specific time in the day (or week etc.), so we wanna keep less copies (to save on disk space) but to delineate age into human understandable periods, if that’s how we want it.

Really, it’s all just arbitrary and you strictly don’t have to abide by these numbers at all (although maybe you might have a weekly ritual of doing your finances every Sunday of every week, so keeping a weekly version… for a while… helps, until it becomes unimportant. And then you can switch from weekly to monthly.

If 7 years is overkill (for many, removing the 0:m entirely is perfectly fine, too), you could change the ‘monthlies’ to anything over 3 years, and then maybe add a ‘quarterlies’ 'til 10 years. i.e. -keep 0:3650 -keep 1095:90 -keep 90:7 -keep 1:1 - the last couple I threw in to further illustrate, in which ‘monthlies’ doesn’t even feature - I just wanna keep a backup copy once a week beyond 90 days, and 1:1 ensures I’m only keeping 1 a day if I happen to be running hourly or whatever. At time goes on, the gap widens.

I actually do this for one of our clients (1:1), where I’m running Vertical Backup (Duplicacy for VMs) at 6am, 6pm, and midnight, and I really just wanna run it as often as possible, while minimising disk space, resource usage during the day. So even if you just wanted daily snapshots, it really is a good idea to run much more frequently anyway, since your last backup may be as much as 24 hours old. -keep 1:1 then becomes a useful last case.

To answer your question…

You can totally have that Local (0:30) and Cloud (0:90) retention and they’d be compatible with each other - so long as you either backup individually to each storage, OR backup to local and then copy local-to-cloud (as most would). Copying Cloud to Local wouldn’t make much sense, however.

HTH

</long post, soz>

Edit: Clarified the ordering of n and m and some typos.

evolze · 28 January 2023 23:46

Hey @Droolio, thanks so much for the explanation! It helped me a little bit, but I think I’m still getting caught up on understanding the calculations behind how many snapshots/rotations will be kept. For me, that’s the biggest part of understanding how many snapshots I will have, or even when I’ll be keep

Let’s go back to your examples, just to make sure I’m understanding correctly (from left-to-right).

-keep 0:3650 -keep 1095:90 -keep 90:7 -keep 1:1

If snapshot is older than 10 years, remove it. Otherwise, continue to next -keep statement.
If snapshot is older than 3 years (1095 days), keep 1 snapshot every 90 days (quarterly). Otherwise, continue to next -keep statement.
If snapshot is older than 90 days, keep 1 snapshot every 7 days. Otherwise, continue to the next -keep statement.
If snapshot is older than 1 day, keep 1 snapshot every day. [END]

So if I’m breaking this down correctly from newest to oldest, it would look like this:
Day 1-90 snapshots (90 daily snapshots kept)
→ Then
90-1095 (1x snapshot kept every 7 days, Age Day# 90, 97, 104… etc.) (where 90, 97, 104, == day where it’s kept)
→ Then
1095-3650 (1x snapshot every 90 days, Age Day# 1095, 1185, 1275… etc.) (where 1095, 1185, 1275, == day where it’s kept)
→ Lastly
3650+ (Remove anything older)

Anyways, I thought about it last night and since Duplicacy uses the grandfather-father-son backup rotation scheme(as you mentioned), I’d ideally like to implement my prune schedule like this:

A -keep 0:720 -keep 30:720 -keep 7:360 -keep 1:30

OR

B -keep 0:720 -keep 30:720 -keep 7:360 -keep 1:1

Delete anything older than 2 years.
Keep monthly backups for 2 years
Keep weekly backups for 1 year.
Keep daily backups for 1 month.

Is the -keep I expressed above in A or B correct for my scenario? If not, could you break it down for me please on the correct options to use? I’ve been looking at various posts for several hours now hoping it would click, and I think I’m almost there! Thank you for your patience with me

Once again, thanks again for your help with this! I’m really wanting to implement Duplicacy, but this snapshot/prune retention has got my head all wrapped around.

Droolio · 29 January 2023 00:42

What immediately screams out is the first two m's are the same (making the second one redundant).

I believe this was the part that made my head hurt at first - it’s just a matter of the syntactical difference between how us humans might think about it (keep Xly for Y long) and how Duplicacy wants it.

The statement “Keep monthly backups for 2 years” on its own is slightly meaningless. (The 2 refers to the cut-off where snapshots get permanently deleted, but Duplicacy wants to know about the next age in the sequence - i.e. “Keep monthlies older than 1 year”, and so on. We just need to shift the m up to the prior n in the sequence.

So:

-keep 0:720 -keep 30:360 -keep 7:30 -keep 1:1

And best written this way (similar to the docs on prune):

Keep a revision per day for revisions older than 1 day
Keep a revision every week for revisions older than 30 days
Keep a revision every month for revisions older than 1 year
Keep no revisions older than 2 years

For more info on how Duplicacy does pruning, this might be a good read:

(Bonus script at the end, to test outcomes. Actually, this old thread reminds me it’s probably best to think of n:m as interval:age.)

evolze · 29 January 2023 22:34

@Droolio, it finally clicked thank you so much! Although, it was a bit confusing at first with how all the variables shifted over, the visual arrow guide helped (visual learner here).

The included forum post also helped solve another area I was a bit worried about, which was how ‘base’ snapshots were kept.

Also, thank you for the PowerShell script. Once I get Duplicacy set up here in a little bit, I’ll be sure to try it out, along with the prune schedule.

I guess I have one more final question, but not related to pruning:

How should I layout my backup schedule (order of scheduled backup tasks)? Here’s the thought process I had (Local → Cloud):

backup to Local repo.
Perform a check on the data to verify its integrity.
copy from Local repo to Cloud repo.
3.5a Another check here maybe after Local is copied to the Cloud repo?
prune previous snapshot data.

Droolio · 30 January 2023 00:42

It’s really up to you, although unfortunately the order might be more significant if Duplicacy were able to abort subsequent jobs if one fails - for example, if check fails, it might be preferred it didn’t go ahead and prune. Sadly, it doesn’t, though also doesn’t matter too much.

So order isn’t that important, but you might want to put the copy after the prune if you’re running it as one schedule. (To avoid copying snapshots that would get pruned fairly quickly in the Cloud anyway.)

Checks are good but the act of doing the copy effectively checks all chunks exist on the local, source, storage, since you’ll get an error if they don’t. Thus you can probably skip daily local checks and just do the cloud check, which effectively verifies both. (Always good to check both after passage of time, but maybe weekly or monthly if you prefer.)

Personally, on my PC I only do local backup to my server every two hours.

On my server, I do a daily 1) local (server files) backup to cloud, 2) prune local, 3) prune cloud, 4) copy local to cloud, 5) check.

If I didn’t have a server, I’d probably separate jobs into two schedules and run the maintenance (prune & check) perhaps less regularly.

evolze · 31 January 2023 22:49

Much appreciated, thank you again for all your help and guidance these past few days!