Prune Help, Simple mind

thedinz · 17 May 2022 14:31

I do not know why my simple mind cannot grasp the prune options but it cannot.

Hoping someone can tell me how to set the prune options to achieve what I would like which is very simple.

I basically want anything in my cloud storage that has been deleted from my local machine 60 days ago should be removed the cloud storage.

That’s it, simple.

Why I can’t figure out how to do this is beyond me.

saspus · 17 May 2022 15:32

You can’t do that.

Prune parameter control lifecycle of snapshots, not individual files.

Snapshots are immutable.

You can either prune a whole snapshot or keep it in the store. You cannot wipe individual files.

Running prune -keep 0:60 will remove snapshots older than 60 days. You will end up with running 60 days deep version history.

If you have a highly volatile subset of data that you don’t/can’t/shouldn’t keep history of forever spitting backup into two might be a better approach, maybe with filters. You will have one blob with infinite history and another with rolling cutoff. How feasible it is would be specific to your actual usecas

thedinz · 17 May 2022 17:45

I dont want to keep ANYTHING in my backup forever.

So what you would you recommend my prune settings be if I simply want to keep anything I deleted off my computer to remain in my backup for 60 days or any number for that matter, I want my backup to have a little buffer for opsies but my use case might differ from most. For the most part, I want my backup to match my local file system but like I said, with a small buffer just in case.

saspus · 17 May 2022 19:49

This is useful to protect against unintended changes in your data that you might not notice right away. Or within 60 days. There is no drawbacks to infinite version history.

You can’t do that. It’s either all or nothing.

You can control lifecycle only on snapshot granularity. So if you prune with -keep 0:60 you will keep all versions of the state of your repository for the past 60 days and none older. The key is the whole repository.

That’s a sync, which is not a backup, for the reasons above.

And the small buffer may not be enough. Not trying to persuade you in any way, just making sure you understand the cost/risk/benefit of such a short version history.

Droolio · 18 May 2022 11:23

Maybe it’s because the way you’re describing it, because “anything I deleted off my computer to remain in my backup for 60 days” doesn’t make sense in the context of how Duplicacy does things.

As soon as you mention 60 days, you’re just describing a retention period for the maximum days you want to have revision history for. As Saspus said:

prune -keep 0:60

This is regardless of deleted files or not. Duplicacy isn’t able to distinguish between deleted or active files, to perhaps have different retention periods - they just disappear out of the snapshots when they get pruned.

While theoretically you could set up a separate ‘recycle bin’ repository, move deleted files in there instead, and run a shorter prune retention - the data would still exist in the older snapshots of the regular repository with the longer retention, and wouldn’t reduce disk usage.

HOWEVER, the opposite could work - having a very short retention for regular snapshots and a 60 day retention for the recycle bin repo. But as Saspus says, this would not be a recommended backup strategy and akin to just ‘sync’, which is very dangerous. Deleted files isn’t the only way you may lose data - erroneous file modifications and ransomware is the whole point we have backups, and keeping regular, historic snapshots around gives us a chance to recover from unforeseen moments.

I wouldn’t go as far as saying there’s no drawbacks to infinite version history - of course there is, that’s pretty close to ‘archival’ and, depending how your data changes over time, could be significantly costly in extra differential data, but I think you’d be surprised at how little extra overhead is involved in keeping many snapshots, at different retention periods (daily vs weekly vs monthly), for longer.

Honestly, consider using a regular retention policy:

-keep 0:365 -keep 7:60 -keep 1:14

And monitor the growth of data via check.

It may be the case your entire revisions history is in fact less than the amount of your most recent snapshot - thanks to de-duplication and compression. And if it isn’t, consider moving the more regularly-changing data to its own repository with a shorter retention. (e.g. My Thunderbird eats up a huge amount extra differential data due to large, monolithic files, although I cba yet to move it or migrate to the still-buggy maildir format. Yet I still have a much wider bunch of snapshots to choose from, compared to that extra overhead.)

thedinz · 18 May 2022 13:04

Like i said, simple mind so i appreciate everyon putting up with my questions and trying to help.

So am i understanding right that duplicacy never removes any files? Or am i once again mis-understadning, i for sure do not want sync but i understand why it seems like that’s what I’m asking for.

In reality all i want is deleted files to EVENTUALLY be deleted from my backups. I do not care in the end if that has to be 365 days or 2 years or whatever as long as i don’t retain files i have personally deleted that i know for sure i never want to see again. Does that make sense?

Droolio · 18 May 2022 18:05

Yes, this happens naturally when you use at least -keep 0:X, where X can be anything you like. The 0 means keep no copies of any snapshots older than X.

If you deleted a file, that file won’t feature in subsequent snapshots, and as older snapshots get pruned, roughly after 365 days in our example, they’ll no longer be retrievable.

towerbr · 19 May 2022 10:55

These are the main reasons. For a certain important file (think of my 2019 financial record, for example), I want it to be present in multiple snapshots, even old ones from several years ago, in case any of the above issues occur with my master copy. If this file gets corrupted and I for some reason don’t notice it, in 1 year I will lose the valid copy of it if I adopt -keep 0:365 eg.

+1 for this. As I mentioned before, I don’t prune. The storage overhead (for the characteristics of my fileset) is very small and just doesn’t even justify the time I would spend managing the scripts that would do this.

bkeeper · 20 May 2022 11:57

+1
I myself never prune important repos for the same reasons.

The only minor downside is that checks may take a loong time with thousands of revisions.