Understanding pruning

Arty.R · 26 August 2019 16:55

I also have a question about this command
If i set option
-keep 0:360 -keep 7:90 -keep 30:180
keep 0:360 means “no snapshots older than 360 days”

FIles older 360 days also will be removed from backup ?

gchen · 26 August 2019 18:34

It means backups older than 360 days will be removed. The prune command only work on backups as a whole; it doesn’t look into individual files contained in each backup.

zizheng.tai · 26 August 2019 18:56

In other words, as long as a file is included in a snapshot within the latest 360 days (i.e. not pruned), it will be kept, right?

gchen · 26 August 2019 20:56

That is correct…

Arty.R · 27 August 2019 05:48

If I understood correctly.
I have a directory where part of the files was deleted.
An archive older than 360 days will no longer have deleted files?
And if I want to be able to recover files deleted more than 360 days ago, the prune option should look like this?

-keep 1:360 -keep 7:90 -keep 30:180

towerbr · 27 August 2019 12:59

Just a little nomenclature correction to keep users aligned:

“revisions older than 360 days will be removed”

Details:

Arty.R · 27 August 2019 16:53

Ok let it be revisions )
but my question still here - if i delete file from folder more than 360 days ago - it not be included in backup ?
with prune scheme
-keep 0:360 -keep 7:90 -keep 30:180

and i need this
-keep 1:360 -keep 7:90 -keep 30:180

TheBestPessimist · 29 August 2019 12:07

Is wrong: you have to have -keep n:m ordered by m decreasingly: Prune command details.

So the correct way to -keep would be

-keep 0:360 -keep 30:180 -keep 7:90

Warning, i could be talking crap here!

-keep 1:360 is understood as: keep a revision every 1 day, for all the revisions older than 360 days.

So you do all the previous prunes (-keep 30:180 -keep 7:90), and this final -keep will store all the revisions that were left untouched.

In the end, under these 3 -keep conditions, it means that after 360 days you would have 1 revision stored forever every 30 days.

Droolio · 29 August 2019 13:59

Yup, there’s literally no point putting that -keep 1:360 in there - it does nothing, as the prior -keep 30:180 rule is the biggest interval. You need to choose a bigger interval or 0 to keep nothing. (While you have to order them by m/age decreasingly, the n/interval only makes sense if they decrease in order too.)

Droolio · 29 August 2019 14:12

I kinda get what @Arty.R might be trying to accomplish…

CrashPlan had this option where you could tell it ‘never remove deleted files’ which I guess meant old revisions would get pruned but there’d be at least one copy. The issue is, Duplicacy can’t do this because pruning is done on a snapshot level.

Personally, I don’t think that’s a problem. CrashPlan’s feature was iffy at best and, from a design perspective, hard to implement and the extra storage requirements could get crazy.

What defines ‘a file’ exactly? What happens if that file is renamed? You’d have to properly detect file deletions at the least. CrashPlan couldn’t always do this, hence it had a (sloooow) verification scan that ran daily.

You accidentally rip a 4K Bluray onto your desktop and it gets backed up - you didn’t want that, but how do you remove it from the backup? CrashPlan had a way to do it, but it was quite fiddling compared to just being able to delete a snapshot.

Droolio · 29 August 2019 14:24

One thing you could do with Duplicacy is set up a special repository - a recycle bin of sorts (maybe the actual recycle bin?). Deleted files would go in there and be backed up. De-duplication would take care of the disk space.

So long as they continue to be referenced by at least one revision, it doesn’t matter that they get removed from the other repositories or even from the bin repo after a backup. So if you don’t prune that repository, those snapshots (files) can stay in the backup storage.

Now the slight issue is getting Duplicacy to omit that one repository when doing a prune. Sadly, you’d have to prune all other repositories one-by-one instead of using -all. That’s a lot of extra work. Maybe the parameters could be improved to allow multiple -ids, exclusions etc.? Or perhaps different retention periods applied directly to each repository - saved in the preferences file, like we discussed for the set rate-limit options.

Arty.R · 30 August 2019 14:06

Now I’m completely confused))

Then I ask you to tell how this command should be written so that I can return the deleted file, for example, two years ago?
And in this case, I need to have a daily copy for three months and a copy once a week older than three months

Droolio · 30 August 2019 14:42

Duplicacy currently cannot guarantee to keep deleted files forever if you prune by retention policy (-keep) - not unless you employ some trick like I mentioned in my last post.

At the end of the day, the thing you need to understand is that backups are stored in snapshots of files.

Pruning deals with removing snapshots, not individual files. Thus deleted files merely no longer appear in subsequent snapshots. And pruning all earlier snapshots will remove all history of such deleted files.

-keep 7:90 -keep 1:1

Edit: This rule will keep weekly snapshots forever, the only way to return a deleted file from years ago.

Arty.R · 30 August 2019 14:49

I understand correctly that in this option all snapshots will be saved ? But those older than 90 days will go once a week.

-keep 7:90 -keep 1:1

akvarius · 31 August 2019 00:21

I think @Droolio has it right, and -keep 7:90 means keep one revision (of snapshot) per 7 days, so you will have one snapshot (revision) a week for snapshots older than 90 days. Younger snapshots are saved except if you add more -keep x:y options with smaller y.

I think -keep 1:1 is only necessary if you make multiple backups per day and only want to keep one per day. If you make one backup per day you can skip that option, since it will not make your younger snapshots more safe. (They are already safe for 90 days)

The important take-away from this is what @Droolio says, and which is a bit unfamiliar compared to CrashPlan:

This needs to be said strongly so users doesn’t get surprised:
When you start using prune, you will lose some deleted files and original versions of changed files.
This means that -keep 7:90 might be too aggressive for your need.
If 90 days is enough time to discover a need and restore a file, then you are OK!
If you rather want to give yourself a one year safety margin, then -keep 7:365 (for instance)
If you can’t afford to lose any originals then you should probably not prune.
(All of this is subject to user cases of course, including storage limits too)

Please don’t get me wrong, I love it and the snapshot-based design is what makes very elegant, clean and tidy and fast!

Arty.R · 31 August 2019 12:20

is it necessary to have
-keep 1:1
in keep options ?

Only this
-keep 7:90
not means this ?:
Keep 1 snapshot every 1 day(s) if not older than 90 day(s)
Keep 1 snapshot every 7 day(s) if older than 90 day(s)

Droolio · 31 August 2019 13:46

Nope, not necessary - if you only do daily backups.

Strictly speaking, only -keep 7:90 means:

Keep 1 snapshot every 7 day(s) if older than 90 day(s)

With no other -keeps, anything less than 90 days won’t get pruned at all. So IF you did hourly backups, you’d have 90 x 24 backups within that age range. Otherwise, daily backups would leave the last 90 backups plus the weeklies beyond 90 days.

Arty.R · 31 August 2019 18:32

Thanks guys, I seem to understand how it works)
final question from me ))

this structure means
-keep 0:1800 -keep 90:730 -keep 30:365 -keep 7:180 -keep 3:90 -a

Keep 1 snapshot every 3 day(s) if older than 90 day(s)
Keep 1 snapshot every 7 day(s) if older than 180 day(s)
Keep 1 snapshot every 30 day(s) if older than 365 day(s)
Keep 1 snapshot every 730 day(s) if older than 730 day(s)
Keep no snapshot if older than 1800 day(s)

Right ?

Droolio · 1 September 2019 00:01

Close. That outcome looks almost like the output you get when running prune… if you want to see exactly what it does without actually performing it, just add -dry-run to the command:

$ duplicacy prune -a -keep 0:1800 -keep 90:730 -keep 30:365 -keep 7:180 -keep 3:90 -dry-run

Keep no snapshots older than 1800 days
Keep 1 snapshot every 90 day(s) if older than 730 day(s)
Keep 1 snapshot every 30 day(s) if older than 365 day(s)
Keep 1 snapshot every 7 day(s) if older than 180 day(s)
Keep 1 snapshot every 3 day(s) if older than 90 day(s)

Christoph · 2 September 2019 21:54